Factor investing is a security selection approach that involves targeting specific elements or factors that appear to drive the returns of an asset. The premise behind Factor in fasting is that using quantitative methods various risk premia can be identified isolated and utilized to the investors advantage. This idea is the corner stone behind almost all quantitative portfolio management methods. The impact factor investing his head on the financial Industry is profound, so much so, that we have seen The worlds largest asset manager (BlackRock) shift massive amounts of assets away from “ Old school fundamental managers” to purely quantitative strategies, and mandate that whatever fundamental managers remained use more data driven processes. Factor investing is not a new idea, but recent developments in computing and mathematics (Machine learning) allow practitioners to approach factor investing from an entirely different angle.
The model makes the following underlying assumptions:
1) Predicting future returns is often considered a fool’s errand by professionals. The reason being, is that there are too many factors that can impact the price of a given security at any given moment to account for accurately. Furthermore, the impact and importance of these factors are constantly changing. Despite these challenges, most practitioners contitnue to try. 2) Selecting the stocks with the highest probability of a desired outcome based on common factors will lead to a more beneficial outcome than selecting stocks based on an expected (predicted) return assuming accurate predictions are not sustainable over longer periods of time. 3) It is at least as important to avoid bad stocks as it is to invest in good ones. Selecting good stocks helps generate returns, avoiding bad stocks helps minimize losses.
The afore mention makes clear that there could be significant benefits to framing the question of security selection as a classification problem as opposed to a regression problem. To solve such a problem we propose the use of a Multilayer perceptron to classify stocks into one of two categories (bianry classification): 1) Above or equal to the median return of all stocks over the next year, or 2) below the median return over the next 1 year.
Neural networks offer some advantages for this task over other supervised classification algorithms. First and foremost is the fact that MLPs have the ability to deal with non linear relationships. Second, where traditional factor investing processes require that we identify and manipulate features on our own, Neural networks let us attack the problem directly using data. Last neural networks are generally better at dealing with noisy data. Noisy data is a very common issue in finance.
The data in question is panel data. Panal data is in its on class of its own. Simply put, panel data is not time series data, and its also not cross- sectional data. It is in fact both. The Multidemensional nature of panel data adds a clear layer of complexity in deciding what model to use. On one hand, the time series aspect of the data suggests RNN or LSTM models would be better suited to deal with the data in question. On the other hand the tabular nature of the data in fact suggests that the MLP may be better suited for the task. In the end given the fact that our objective was to classify and not predict price in the future the time element seemed secondary and we made the choice to treat the data as cross-sectional data and ignore/minimize the effect of time in the data this made the multilayer perceptron the best choice for the task as framed
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import IsolationForest
from sklearn.metrics import mean_absolute_error
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from tensorflow import keras
from keras.optimizers import adam_v2 as Adam
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.regularizers import l1, l2
from keras.callbacks import History, EarlyStopping
from keras.backend import clear_session
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import plotly.express as px
import plotly.tools as tls
import pickle
import re
from threading import Event
from yahoo_fin import stock_info as si
from sqlalchemy import create_engine
import psycopg2
import io
import os
from psycopg2 import connect
from sklearn.preprocessing import LabelEncoder
from ipywidgets import widgets, VBox, HBox
import warnings
warnings.simplefilter("ignore")
import portfolio as prt
from numpy.random import seed
import tensorflow as tf
import port_2 as prt
import pandas_market_calendars as mcal
import financial_metrics as fm
import sys, importlib
from sklearn.linear_model import LinearRegression
import port_charts as pch
import feature_creation as fc
import plotly
plotly.offline.init_notebook_mode()
import matplotlib.pyplot as plt
import seaborn as sns
import itertools
import keras_tuner as kt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.optimizers import Adam
The model uses data from four different categories:
1) Accounting data – this is data taken directly from a company’s financial statments (Income statement, Balance sheet, Cashflow statement). 2) Trading data – This includes data that is based on market activity over a given period of time. 3) Valuation data – This is generally market data normalized by accounting data 4) Technical Indicators – This includes moving averages for various windows based on the price of the stock.
In all the model uses 69 features for 504 different stocks over about 33 months.
The structure of the data my raise questions about the decision to use a standard feed forward network as apposed to an RNN or an LSTM. This issue was in fact taken into consideration and the decision to use a feed forward network as opposed to one of the other two was based on the following: 1) The data structure is not exactly sequential, The data used in the model is in fact panel data which has both timeseries and cross-sectional elements 2) The literature and academic justification for using Neural networks on panel data is sparse. 3) Given the fact that the model was structured as a classification problem, the time element is less meaningful and as such the data can be view as purely tabular
The datasets all come from three sources:
1) Refinitive Eikon (API) – this is a subscription based financial data provider that provides access to thousands of corporate and financial data sets for companies and markets around the globe. 2) Yahoo finance (API) – within the scope of this model the yahoo finance API is used strictly for stock price history, and for the sake of convenience. 3) Datastream Webservices – this is subscription based financial and economic data provider that also provides access to data sets for companies, countries, and markets around the world
To limit the need for multiple calls and to minimize the use of local storage the raw data required from afore mentioned sources was stored in an SQL database and is updated regularly from a machine with access to an Eikon Terminal (required for access to two of the three sources) To view the code for the calls to I have provided a notebook titled data_collection_five_factor. The functions below will import all the required data from the database THIS CAN TAKE UP TO ONE HOUR TO COMPLETE do not run if unnecessary
###SQL Engine
engine = fc.engine
###set random seeds
seed(42)
tf.random.set_seed(42)
def combine_inc_bal():
"""This function imports all accounting data sets (income statment, cashflow statment, balance sheet) and merges them for further use."""
with open("assets/models/jeff_multi_factor/spy_rics.pkl", "rb") as f:
rics = pickle.load(f)
conn = connect(dbname = '697_temp', user = 'postgres', host = '********', password = '********')
cur = conn.cursor()
query = "SELECT * FROM bal_sht"
df_balance = pd.read_sql_query(query,conn)
query = "SELECT * FROM inc_stat"
df_income = pd.read_sql_query(query,conn)
query = "SELECT * FROM cf_stat"
df_cf = pd.read_sql_query(query,conn)
inc_bal = pd.DataFrame()
for r in rics:
inc = df_income[df_income['Instrument'] == r]
bal = df_balance[df_balance['Instrument'] == r]
cf = df_cf[df_cf['Instrument'] == r]
inc_bal_1 = inc.merge(bal, on=['Instrument', "Original Announcement Date Time"])
inc_bal_1 = inc_bal_1.merge(cf, on=['Instrument', "Original Announcement Date Time"])
inc_bal_1 = inc_bal_1.rename(columns={'Original Announcement Date Time': 'date',
'EPS - Basic - incl Extraordinary Items, Common - Total': 'eps_basic',
'EPS - Basic - excl Extraordinary Items, Common - Total': 'eps_excl_basic',
'EPS - Diluted - excl Exord Items Applicable to Common Total': 'eps_dil',
'EPS Basic incl Exord, Common - Total, PoP Diff': 'eps_growth',
'Earnings before Interest & Taxes (EBIT)': 'ebit',
'Earnings before Interest Taxes Depreciation & Amortization': 'ebitda',
'Income Available to Common Shares': 'net_inc',
'Revenue from Business Activities - Total': 'rev',
'Total Assets': 'tot_assets',
'Other Assets - Total': 'oth_assets',
'Cash & Short Term Investments - Total': 'cash',
"Total Shareholders' Equity incl Minority Intr & Hybrid Debt": "book_val",
"Tangible Total Equity": "tang_book", "Debt - Total": "debt",
'Total Long Term Capital': "lt_cap",
'Intangible Assets - Total - Net': 'intang',
'Book Value per Share': 'bvps',
"Net Cash Flow from Operating Activities": "ocf",
'Capital Expenditures - Total': "capex",
'Cash Flow from Operations per Share': 'cfo_ps',
"Free Cash Flow to Equity": 'fcfe'})
inc_bal = pd.concat([inc_bal, inc_bal_1])
inc_bal['noa'] = inc_bal["tot_assets"] - inc_bal["oth_assets"]
inc_bal['ebit_bv'] = inc_bal['ebit'] / inc_bal["book_val"]
inc_bal['ebit_nonop'] = inc_bal['ebit'] / inc_bal["oth_assets"]
inc_bal['ebit_op'] = inc_bal['ebit'] / inc_bal['noa']
inc_bal['ebit_tot'] = inc_bal['ebit'] / inc_bal["tot_assets"]
inc_bal['ebit_mgn'] = inc_bal['ebit'] / inc_bal['rev']
inc_bal['net_debt'] = inc_bal['debt'] - inc_bal['cash']
inc_bal['nd_ebitda'] = inc_bal['net_debt'] / inc_bal['ebitda']
inc_bal['ni_op'] = inc_bal['net_inc'] / inc_bal['noa']
inc_bal['ni_tot'] = inc_bal['net_inc'] / inc_bal["tot_assets"]
inc_bal['ni_mgn'] = inc_bal['net_inc'] / inc_bal['rev']
inc_bal['ocf_bv'] = inc_bal['ocf'] / inc_bal["book_val"]
inc_bal['ocf_op'] = inc_bal['ocf'] / inc_bal['noa']
inc_bal['ocf_tot'] = inc_bal['ocf'] / inc_bal["tot_assets"]
inc_bal['ocf_mgn'] = inc_bal['ocf'] / inc_bal['rev']
inc_bal['ocf_ce'] = inc_bal['ocf'] / inc_bal['lt_cap']
inc_bal['ocf_bv'] = inc_bal['ocf'] / inc_bal["book_val"]
inc_bal['ocf_op'] = inc_bal['ocf'] / inc_bal['noa']
inc_bal['ocf_tot'] = inc_bal['ocf'] / inc_bal["tot_assets"]
inc_bal['ocf_mgn'] = inc_bal['ocf'] / inc_bal['rev']
inc_bal['ocf_ce'] = inc_bal['ocf'] / inc_bal['lt_cap']
inc_bal['fcf_bv'] = inc_bal['fcfe'] / inc_bal["book_val"]
inc_bal['fcf_op'] = inc_bal['fcfe'] / inc_bal['noa']
inc_bal['fcf_tot'] = inc_bal['fcfe'] / inc_bal["tot_assets"]
inc_bal['fcf_mgn'] = inc_bal['fcfe'] / inc_bal['rev']
inc_bal['fcf_ce'] = inc_bal['fcfe'] / inc_bal['lt_cap']
inc_bal['fcf_bv'] = inc_bal['fcfe'] / inc_bal["book_val"]
inc_bal['fcf_op'] = inc_bal['fcfe'] / inc_bal['noa']
inc_bal['fcf_tot'] = inc_bal['fcfe'] / inc_bal["tot_assets"]
inc_bal['fcf_mgn'] = inc_bal['fcfe'] / inc_bal['rev']
inc_bal['fcf_ce'] = inc_bal['fcfe'] / inc_bal['lt_cap']
query = "SELECT * FROM qual_dat"
qual = pd.read_sql_query(query,conn)
qual = qual.rename(columns={"Original Announcement Date Time": 'date',
"Return on Average Common Equity - %, TTM": "roe",
"Return on Average Total Assets - %, TTM": "roa",
"Return on Average Total Long Term Capital - %, TTM": "roce",
"Return on Invested Capital - %, TTM": "roic",
"Total Debt Percentage of Total Equity": 'd_e'})
qual['date'] = pd.to_datetime(qual['date']).dt.date
inc_bal['date'] = pd.to_datetime(inc_bal['date']).dt.date
inc_bal = inc_bal.merge(qual, on=['date', 'Instrument'], how='outer')
return inc_bal
def mkt_cap_feat():
"""Caluclates average market cap over various timeframes """
conn = connect(dbname = '697_temp', user = 'postgres', host = '********', password = '********')
cur = conn.cursor()
query = "SELECT * FROM mkt_cap"
mkt_cap = pd.read_sql_query(query,conn)
mkt_cap = mkt_cap.set_index('Dates')
mkt_cap.columns = [re.findall("(?<=\<)(.*?)(?=\>)", i)[0] for i in mkt_cap.columns]
mkt_cap = mkt_cap * 1000000
mkt_cap12m = mkt_cap.rolling(252).mean().dropna(axis=0, how='all')
mkt_cap12m = mkt_cap12m.rename(columns={i: '{}_12m_avg_mktcap'.format(i) for i in mkt_cap12m.columns})
mkt_cap3m = mkt_cap.rolling(63).mean().dropna(axis=0, how='all')
mkt_cap3m = mkt_cap3m.rename(columns={i: '{}_3m_avg_mktcap'.format(i) for i in mkt_cap3m.columns})
mkt_cap6m = mkt_cap.rolling(126).mean().dropna(axis=0, how='all')
mkt_cap6m = mkt_cap6m.rename(columns={i: '{}_6m_avg_mktcap'.format(i) for i in mkt_cap6m.columns})
mkt_cap_features = mkt_cap12m.join(mkt_cap6m, how='inner')
mkt_cap_features = mkt_cap_features.join(mkt_cap3m, how='inner')
mkt_cap_features.index.name = "Date"
mkt_cap_features.to_csv('assets/models/jeff_multi_factor/mkt_cap_feats.csv')
return mkt_cap_features
def merge_vol_mkt():
"""combines market cap data and volume data"""
conn = connect(dbname = '697_temp', user = 'postgres', host = '********', password = '********')
cur = conn.cursor()
query = "SELECT * FROM mkt_cap"
mkt_cap = pd.read_sql_query(query,conn)
mkt_cap = mkt_cap.set_index('Dates')
mkt_cap.columns = [re.findall("(?<=\<)(.*?)(?=\>)", i)[0] for i in mkt_cap.columns]
mkt_cap = mkt_cap * 1000000
mkt_cap12m = mkt_cap.rolling(252).mean().dropna(axis=0, how='all')
mkt_cap12m = mkt_cap12m.rename(columns={i: '{}_12m_avg_mktcap'.format(i) for i in mkt_cap12m.columns})
mkt_cap3m = mkt_cap.rolling(63).mean().dropna(axis=0, how='all')
mkt_cap3m = mkt_cap3m.rename(columns={i: '{}_3m_avg_mktcap'.format(i) for i in mkt_cap3m.columns})
mkt_cap6m = mkt_cap.rolling(126).mean().dropna(axis=0, how='all')
mkt_cap6m = mkt_cap6m.rename(columns={i: '{}_6m_avg_mktcap'.format(i) for i in mkt_cap6m.columns})
mkt_cap_features = mkt_cap12m.join(mkt_cap6m, how='inner')
mkt_cap_features = mkt_cap_features.join(mkt_cap3m, how='inner')
mkt_cap_features.index.name = "Date"
query = "SELECT * FROM vol_12"
vol_12 = pd.read_sql_query(query,conn)
query = "SELECT * FROM vol_6"
vol_6 = pd.read_sql_query(query,conn)
query = "SELECT * FROM vol_3"
vol_3 = pd.read_sql_query(query,conn)
vol = vol_12.set_index('Date').join(vol_6.set_index('Date'), how = 'outer')
vol = vol.join(vol_3.set_index('Date'), how = 'outer')
vol.index = pd.to_datetime(vol.index)
mkt_cap_features.index = pd.to_datetime(mkt_cap_features.index)
trading = mkt_cap_features.join(vol, how='inner')
trading.index.name = 'date'
trading.index = pd.to_datetime(trading.index).date
with open("assets/models/jeff_multi_factor/spy_rics.pkl", "rb") as f:
rics = pickle.load(f)
df = pd.DataFrame()
for r in rics:
try:
vol_12 = trading['{}_12m_volume'.format(r)].to_frame(name='12m_volume')
vol_6 = trading['{}_6m_volume'.format(r)].to_frame(name='6m_volume')
vol_3 = trading['{}_3m_volume'.format(r)].to_frame(name='3m_volume')
mkt_12 = trading['{}_12m_avg_mktcap'.format(r)].to_frame(name='12m_avg_mkt_cap')
mkt_6 = trading['{}_6m_avg_mktcap'.format(r)].to_frame(name='6m_avg_mkt_cap')
mkt_3 = trading['{}_3m_avg_mktcap'.format(r)].to_frame(name='3m_avg_mkt_cap')
trad = vol_12.join(vol_6, how='outer')
trad = trad.join(vol_3, how='outer')
trad = trad.join(mkt_12, how='outer')
trad = trad.join(mkt_3, how='outer')
trad = trad.join(mkt_6, how='outer')
trad = trad.sort_index(ascending=False)
trad['Instrument'] = r
df = pd.concat([df, trad])
except:
pass
return df
def creat_labs_vol():
"""Creates labels, volatility data, and momentum data"""
with open("assets/models/jeff_multi_factor/spy_rics.pkl", "rb") as f:
rics = pickle.load(f)
tics = [i.split('.')[0] for i in rics]
tics = ['BRK-B' if i == 'BRKb' else i for i in tics]
tics = ['BF-B' if i == 'BFb' else i for i in tics]
df = pd.DataFrame()
px_dat = pd.DataFrame()
timer = Event()
for s in tics:
try:
px = si.get_data(s)
px = px.rename(columns={"ticker": "Instrument"})
prices = px.filter(['Instrument', 'close'])
px_dat = pd.concat([px_dat, prices])
ret_1yr = px['adjclose'].pct_change(252).to_frame(name='1yr_ret').shift(-252)
ret_3m = px['adjclose'].pct_change(63).to_frame(name='3m_ret').shift(-63)
ret_6m = px['adjclose'].pct_change(126).to_frame(name='6m_ret').shift(-126)
vol_1yr = px['adjclose'].pct_change().rolling(252).std().to_frame(name='1yr_vol') * np.sqrt(252)
vol_3m = px['adjclose'].pct_change().rolling(63).std().to_frame(name='3mth_vol') * np.sqrt(252)
vol_6m = px['adjclose'].pct_change().rolling(126).std().to_frame(name="6mth_vol") * np.sqrt(252)
mom_1yr = px['adjclose'].pct_change(252).to_frame(name='1yr_mom')
mom_3m = px['adjclose'].pct_change(63).to_frame(name='3m_mom')
mom_6m = px['adjclose'].pct_change(126).to_frame(name='6m_mom')
px_based = ret_1yr.join(ret_3m, how='outer')
px_based = px_based.join(ret_6m, how='outer')
px_based = px_based.join(vol_1yr, how='outer')
px_based = px_based.join(vol_6m)
px_based = px_based.join(vol_3m)
px_based = px_based.join(mom_1yr)
px_based = px_based.join(mom_6m)
px_based = px_based.join(mom_3m)
px_based['Instrument'] = s
df = pd.concat([df, px_based])
print('{}: no {} of {} complete'.format(s, len(df['Instrument'].dropna().unique()), len(tics)))
except:
timer.wait(5)
px = si.get_data(s)
px = px.rename(columns={"ticker": "Instrument"})
prices = px.filter(['Instrument', 'close'])
px_dat = pd.concat([px_dat, prices])
ret_1yr = px['adjclose'].pct_change(252).to_frame(name='1yr_ret').shift(-252)
ret_3m = px['adjclose'].pct_change(63).to_frame(name='3m_ret').shift(-63)
ret_6m = px['adjclose'].pct_change(126).to_frame(name='6m_ret').shift(-126)
vol_1yr = px['adjclose'].pct_change().rolling(252).std().to_frame(name='1yr_vol') * np.sqrt(252)
vol_3m = px['adjclose'].pct_change().rolling(63).std().to_frame(name='3mth_vol') * np.sqrt(252)
vol_6m = px['adjclose'].pct_change().rolling(126).std().to_frame(name="6mth_vol") * np.sqrt(252)
mom_1yr = px['adjclose'].pct_change(252).to_frame(name='1yr_mom')
mom_3m = px['adjclose'].pct_change(63).to_frame(name='3m_mom')
mom_6m = px['adjclose'].pct_change(126).to_frame(name='6m_mom')
px_based = ret_1yr.join(ret_3m, how='outer')
px_based = px_based.join(ret_6m, how='outer')
px_based = px_based.join(vol_1yr, how='outer')
px_based = px_based.join(vol_6m)
px_based = px_based.join(vol_3m)
px_based = px_based.join(mom_1yr)
px_based = px_based.join(mom_6m)
px_based = px_based.join(mom_3m)
px_based['Instrument'] = s
df = pd.concat([df, px_based])
print('{}: no {} of {} complete'.format(s, len(df['Instrument'].dropna().unique()), len(tics)))
engine = create_engine('postgresql+psycopg2://postgres:********@********:5432/697_temp')
px_dat = px_dat.reset_index()
px_dat.to_sql("px_dat", con = engine, if_exists = 'replace', index = False)
conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
px_dat.to_csv('assets/models/jeff_multi_factor/close_prices.csv', sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, "px_dat")
conn.commit()
os.remove('assets/models/jeff_multi_factor/close_prices.csv')
print("done")
return df
def valuation():
"""Creates valuation features including EV and Px pased multiples"""
df = combine_inc_bal()
df['date'] = pd.to_datetime(df['date']).dt.date
rics = list(df['Instrument'])
tics = [i.split('.')[0] for i in rics]
tics = ['BRK-B' if i == 'BRKb' else i for i in tics]
tics = ['BF-B' if i == 'BFb' else i for i in tics]
df['Instrument'] = tics
df = df.dropna(subset=['date'])
acts = df.filter(['Instrument', 'date', 'eps_excl_basic', 'bvps', 'cfo_ps', 'ebit',
'ebitda', 'rev', 'debt', 'cash'])
conn = connect(dbname = '697_temp', user = 'postgres', host = '********', password = '********')
cur = conn.cursor()
query = "SELECT * FROM px_dat"
px = pd.read_sql_query(query,conn)
px = px.rename(columns = {"index":"date"})
px['date'] = pd.to_datetime(px['date']).dt.date
valuation_df = px.merge(acts, on=['date', 'Instrument'], how='outer').fillna(method='ffill')
valuation_df = valuation_df.drop_duplicates()
valuation_df['p_e'] = valuation_df['close'] / valuation_df["eps_excl_basic"]
valuation_df['p_b'] = valuation_df['close'] / valuation_df["bvps"]
valuation_df['p_cf'] = valuation_df['close'] / valuation_df["cfo_ps"]
conn = connect(dbname = '697_temp', user = 'postgres', host = '********', password = '********')
cur = conn.cursor()
query = "SELECT * FROM mkt_cap"
mkt_cap = pd.read_sql_query(query,conn)
mkt_cap = mkt_cap.set_index('Dates')
mkt_cap.columns = [re.findall("(?<=\<)(.*?)(?=\>)", i)[0] for i in mkt_cap.columns]
mkt_cap.columns = [i.split('.')[0] for i in mkt_cap.columns]
mkt_cap = mkt_cap * 1000000
mkt_cap = mkt_cap.reset_index()
mkt_cap = mkt_cap.rename(columns={'Dates': 'date'})
mkt_cap['date'] = pd.to_datetime(mkt_cap['date']).dt.date
mkt_values = pd.melt(mkt_cap, id_vars='date', var_name='Instrument', value_name='mkt_cap')
valuation_df = valuation_df.merge(mkt_values, on=['date', 'Instrument'], how='outer').dropna()
valuation_df['ev'] = valuation_df['mkt_cap'] + valuation_df['debt'] - valuation_df['cash']
valuation_df['ev_ebit'] = valuation_df['ev'] / valuation_df['ebit']
valuation_df['ev_ebitda'] = valuation_df['ev'] / valuation_df['ebitda']
valuation_df['ev_sales'] = valuation_df['ev'] / valuation_df['rev']
valuation_df = valuation_df.filter(['date', 'Instrument', 'p_e', 'p_b', 'p_cf', 'ev',
'ev_ebit', 'ev_ebitda', 'ev_sales'])
return valuation_df
def technicals():
"""calculates 200 day and 50 day exponentially weighted moving averages """
conn = connect(dbname = '697_temp', user = 'postgres', host = '********', password = '********')
cur = conn.cursor()
query = "SELECT * FROM px_dat"
px = pd.read_sql_query(query,conn)
px = px.rename(columns = {"index":"date"})
px['date'] = pd.to_datetime(px['date']).dt.date
df = pd.DataFrame()
for s in px['Instrument'].unique():
stock = px[px['Instrument'] == s]
stock['200_ma'] = stock['close'].ewm(span=200).mean()
stock['50_ma'] = stock['close'].ewm(span=50).mean()
df = pd.concat([df, stock])
return df
def merge_data(act, trd, val, tec, labs):
"""Combines all datasets"""
mkt_vol = trd.reset_index()
mkt_vol = mkt_vol.rename(columns={'index': 'date'})
rics = list(mkt_vol['Instrument'])
tics = [i.split('.')[0] for i in rics]
tics = ['BRK-B' if i == 'BRKb' else i for i in tics]
tics = ['BF-B' if i == 'BFb' else i for i in tics]
mkt_vol['Instrument'] = tics
mkt_vol['date'] = pd.to_datetime(mkt_vol['date']).dt.date
mom_labs = labs
mom_labs = mom_labs.reset_index()
mom_labs = mom_labs.rename(columns={'index': 'date'})
mom_labs['date'] = pd.to_datetime(mom_labs['date']).dt.date
labs = mom_labs.filter(['Instrument', 'date', '1yr_ret', '3m_ret', '6m_ret'])
mom = mom_labs.drop(['1yr_ret', '3m_ret', '6m_ret'], axis=1)
val = val
val['date'] = pd.to_datetime(val['date']).dt.date
tech = tec
tech['date'] = pd.to_datetime(tech['date']).dt.date
act = act
rics = list(act['Instrument'])
tics = [i.split('.')[0] for i in rics]
tics = ['BRK-B' if i == 'BRKb' else i for i in tics]
tics = ['BF-B' if i == 'BFb' else i for i in tics]
act['Instrument'] = tics
act['date'] = pd.to_datetime(act['date']).dt.date
data = mkt_vol.merge(mom, on=['date', 'Instrument'], how='inner')
data = data.merge(val, on=['date', 'Instrument'], how='inner')
data = data.merge(tech, on=['date', 'Instrument'], how='inner')
data['date'] = pd.to_datetime(data['date']).dt.date
data = data.merge(act, on=['Instrument', 'date'], how='outer')
data = data.sort_values(by=['Instrument', 'date'])
data = data.drop_duplicates()
data = data.fillna(method='ffill').dropna()
labs = labs.sort_values(by=['Instrument', 'date'])
data = data.merge(labs, on=['Instrument', 'date'], how='outer')
engine = create_engine('postgresql+psycopg2://postgres:********@********:5432/697_temp')
data.to_sql("raw_data", con = engine, if_exists = 'replace', index = False, chunksize = 500)
conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
data.to_csv('assets/models/jeff_multi_factor/aggregate_features.csv', sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, "raw_data")
conn.commit()
os.remove('assets/models/jeff_multi_factor/aggregate_features.csv')
print("done")
return data
def create_blocks():
"""converts dataset to monthly data, and generates data blocks"""
conn = connect(dbname = '697_temp', user = 'postgres', host = '********', password = '********')
cur = conn.cursor()
query = "SELECT * FROM raw_data"
data = pd.read_sql_query(query,conn)
data["date"] = pd.to_datetime(data["date"])
reduced = data.dropna(subset=['12m_volume'], axis=0)
monthly = reduced.loc[reduced.date.isin(reduced.date + pd.offsets.BMonthEnd(1))]
monthly["q"] = monthly['date'].dt.quarter
monthly['yr'] = monthly["date"].dt.year
monthly['timeframe'] = monthly["q"].astype(str)+"_"+monthly["yr"].astype(str)
for t in monthly["timeframe"].unique():
tf = monthly[monthly['timeframe']==t]
tf = tf.drop(["q", "yr", "timeframe"], axis = 1)
tf.to_csv("assets/models/jeff_multi_factor/block_data{}.csv".format(t))
monthly = monthly.drop(["q", "yr", "timeframe"], axis = 1)
engine = create_engine('postgresql+psycopg2://postgres:********@********:5432/697_temp')
monthly.to_sql("monthly", con = engine, if_exists = 'replace', index = False, chunksize = 500)
conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
monthly.to_csv('assets/models/jeff_multi_factor/monthly.csv', sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, "monthly")
conn.commit()
os.remove('assets/models/jeff_multi_factor/monthly.csv')
print("done")
return monthly
def run_feature_engine():
"""This function will run all the functions used to compile the data needed for the model from the database, and will generate blocks this function will take about 1 hour to run"""
act = combine_inc_bal()
print('act_done')
trd = merge_vol_mkt()
print('trd_done')
labs = creat_labs_vol()
print('labs_done')
val = valuation()
print('val_done')
tec = technicals()
print('tec_done')
raw = merge_data(act, trd, val, tec, labs)
print("DB updated")
monthly = create_blocks()
print("Done")
return monthly
#to recreate features uncomment below, run, and wait...or just move on to the next cell to import pre-engineered features
#run_feature_engine()
###helper
def batch(iterable, n=1):
"""Helper function that assist in managing batch sizes"""
l = len(iterable)
for ndx in range(0, l, n):
yield iterable[ndx:min(ndx + n, l)]
def import_monthly():
"""Imports our dataset of monthly features"""
conn = connect(dbname = '697_temp', user = 'postgres', host = '********', password = '********')
cur = conn.cursor()
query = "SELECT * FROM monthly"
data = pd.read_sql_query(query,conn)
data = data.sort_values(['Instrument', 'date'])
data['date'] = pd.to_datetime(data['date'])
return data
data = import_monthly()
The following data evaluation process has three purposes.
The histogram charts below allow us to approximate the distribution of the various features fit into the model. Due to the asymmetric nature of the many of the features, and the difficulty of identifying observations in the tails of the distribution via standard distribution plots, The data below was transformed to a logarithmic scale. After the date it was is transformed it is clear that the data is fraught with outliers.
reduced = data.dropna(subset=['12m_volume'], axis = 0)
dist_dat = reduced.drop(['3m_ret', '6m_ret', 'Instrument', 'date'], axis = 1)
dist_dat = dist_dat.replace([np.inf, -np.inf], np.nan)
dist_dat = dist_dat.dropna()
scaled = StandardScaler().fit_transform(dist_dat)
dist_dat_2 = pd.DataFrame(scaled, columns = dist_dat.columns)
xaxes = ['x_{}_log'.format(i) for i in dist_dat.columns]
yaxes = ['freq' for i in dist_dat.columns]
titles = dist_dat.columns
palette = itertools.cycle(sns.color_palette())
f,a = plt.subplots(14,5, figsize = (20,30))
a = a.ravel()
for idx, ax in enumerate(a):
c = next(palette)
ax.hist(dist_dat_2[dist_dat_2.columns[idx]],histtype = u'stepfilled', density=True, color = c, log = True)
ax.set_title(titles[idx])
ax.set_xlabel(xaxes[idx])
ax.set_ylabel(yaxes[idx])
plt.tight_layout()
Below is a basic correlation matrix. The correlation matrix makes very clearly that for the most part correlations are weak (Close to zero) with pockets of stronger positive correlations amongst certain features (accounting features in particular). We can deduce the following from the correlation matrix:
###interactive correlation matrix
reduced = data.dropna(subset=['12m_volume'], axis = 0)
labels = reduced.filter(['date','Instrument','1yr_ret', '3m_ret', '6m_ret'])
features = reduced.drop(['1yr_ret', '3m_ret', '6m_ret'], axis = 1)
corr_dat = reduced.drop(['3m_ret', '6m_ret'], axis = 1).dropna()
corr_dat = corr_dat.groupby('date').corr()[['1yr_ret']].reset_index()
corr_dat = corr_dat[corr_dat['level_1']!='1yr_ret']
corr = reduced.drop(['3m_ret', '6m_ret'], axis = 1).dropna()
corr = corr.corr()
fig = go.Figure(data=go.Heatmap(
z = corr,
x = corr.index,
y = corr.columns,
xgap =2,
ygap =2,
colorscale="RdBu"
))
fig.update_layout(
title = "Correlation Matrix",
title_x = 0.5,
paper_bgcolor='white',
plot_bgcolor='white',
height = 900,
width = 900,
margin=dict(
l=30,
r=30,
b=30,
t=40,
),
)
fig.update_xaxes(tickfont = dict(size = 8))
fig.update_yaxes(tickfont = dict(size = 8))
fig.show()
We are not only intrested in the correlation between the features and 1 year forward returns but also in the distribution of the such correlations. This allows us to better understand the impact any outliers may be having ont he data. The box plots below illustrate this distribution. The correlations are computed on a date by date basis for the entire cross-section of stocks. The plot shows that for the most part correlations revolve around zero, but can go as high as 0.7 at the higher end of 1 year vol and as low as -0.47 for three month momentum. The key takeaway from this segment is that not only are correlations between features and One year forward returns low as demonstrated previously from the correlation matrix, but the dispersion within the correlation data of every feature to one-year returns is significant in many cases.
x_dat = corr_dat['level_1'].unique()[0:10]
y_dat = [list(corr_dat[corr_dat['level_1']==i]['1yr_ret']) for i in x_dat][0:10]
colors = ['rgb(136, 204, 238)', 'rgb(204, 102, 119)', 'rgb(221, 204, 119)', 'rgb(17, 119, 51)', 'rgb(51, 34, 136)',
'rgb(170, 68, 153)', 'rgb(68, 170, 153)', 'rgb(153, 153, 51)', 'rgb(136, 34, 85)', 'rgb(102, 17, 0)',
'rgb(136, 136, 136)']
fig = go.Figure()
for x, y, cl in zip(x_dat, y_dat, colors[0:10]):
fig.add_trace(go.Box(
y = y,
name=x,
#boxpoints='all',
whiskerwidth= 0.2,
line_width=1,
fillcolor=cl,
boxmean = "sd"
))
fig.update_layout(
title='Correlation to 1 Year forward return (Features 1-10)',
yaxis=dict(
autorange=True,
showgrid=True,
zeroline=True,
dtick=5,
gridcolor='rgb(255, 255, 255)',
gridwidth=1,
zerolinecolor='rgb(255, 255, 255)',
zerolinewidth=2,
),
margin=dict(
l=40,
r=30,
b=80,
t=100,
),
height = 400,
width = 700,
paper_bgcolor='rgb(243, 243, 243)',
plot_bgcolor='rgb(243, 243, 243)',
showlegend=False
)
fig.update_xaxes(tickfont = dict(size = 8))
f1 = go.FigureWidget(fig)
x_dat = corr_dat['level_1'].unique()[10:20]
y_dat = [list(corr_dat[corr_dat['level_1']==i]['1yr_ret']) for i in x_dat]
fig_2 = go.Figure()
for x, y, cl in zip(x_dat, y_dat, colors[0:10]):
fig_2.add_trace(go.Box(
y = y,
name=x,
#boxpoints='all',
whiskerwidth= 0.2,
line_width=1,
fillcolor=cl,
boxmean = "sd"
))
fig_2.update_layout(
title='Correlation to 1 Year forward return (Features 10-20)',
yaxis=dict(
autorange=True,
showgrid=True,
zeroline=True,
dtick=5,
gridcolor='rgb(255, 255, 255)',
gridwidth=1,
zerolinecolor='rgb(255, 255, 255)',
zerolinewidth=2,
),
margin=dict(
l=40,
r=30,
b=80,
t=100,
),
height = 400,
width = 700,
paper_bgcolor='rgb(243, 243, 243)',
plot_bgcolor='rgb(243, 243, 243)',
showlegend=False
)
fig_2.update_xaxes(tickfont = dict(size = 8))
f2 = go.FigureWidget(fig_2)
x_dat = corr_dat['level_1'].unique()[20:30]
y_dat = [list(corr_dat[corr_dat['level_1']==i]['1yr_ret']) for i in x_dat]
fig_3 = go.Figure()
for x, y, cl in zip(x_dat, y_dat, colors[0:10]):
fig_3.add_trace(go.Box(
y = y,
name=x,
#boxpoints='all',
whiskerwidth= 0.2,
line_width=1,
fillcolor=cl,
boxmean = "sd"
))
fig_3.update_layout(
title='Correlation to 1 Year forward return (Features 20-30)',
yaxis=dict(
autorange=True,
showgrid=True,
zeroline=True,
dtick=5,
gridcolor='rgb(255, 255, 255)',
gridwidth=1,
zerolinecolor='rgb(255, 255, 255)',
zerolinewidth=2,
),
margin=dict(
l=40,
r=30,
b=80,
t=100,
),
height = 400,
width = 700,
paper_bgcolor='rgb(243, 243, 243)',
plot_bgcolor='rgb(243, 243, 243)',
showlegend=False
)
fig_3.update_xaxes(tickfont = dict(size = 8))
f3 = go.FigureWidget(fig_3)
x_dat = corr_dat['level_1'].unique()[30:40]
y_dat = [list(corr_dat[corr_dat['level_1']==i]['1yr_ret']) for i in x_dat]
fig_4 = go.Figure()
for x, y, cl in zip(x_dat, y_dat, colors[0:10]):
fig_4.add_trace(go.Box(
y = y,
name=x,
#boxpoints='all',
whiskerwidth= 0.2,
line_width=1,
fillcolor=cl,
boxmean = "sd"
))
fig_4.update_layout(
title='Correlation to 1 Year forward return (Features 30-40)',
yaxis=dict(
autorange=True,
showgrid=True,
zeroline=True,
dtick=5,
gridcolor='rgb(255, 255, 255)',
gridwidth=1,
zerolinecolor='rgb(255, 255, 255)',
zerolinewidth=2,
),
margin=dict(
l=40,
r=30,
b=80,
t=100,
),
height = 400,
width = 700,
paper_bgcolor='rgb(243, 243, 243)',
plot_bgcolor='rgb(243, 243, 243)',
showlegend=False
)
fig_4.update_xaxes(tickfont = dict(size = 8))
f4 = go.FigureWidget(fig_4)
x_dat = corr_dat['level_1'].unique()[40:50]
y_dat = [list(corr_dat[corr_dat['level_1']==i]['1yr_ret']) for i in x_dat]
fig_5 = go.Figure()
for x, y, cl in zip(x_dat, y_dat, colors[0:10]):
fig_5.add_trace(go.Box(
y = y,
name=x,
#boxpoints='all',
whiskerwidth= 0.2,
line_width=1,
fillcolor=cl,
boxmean = "sd"
))
fig_5.update_layout(
title='Correlation to 1 Year forward return (Features 40-50)',
yaxis=dict(
autorange=True,
showgrid=True,
zeroline=True,
dtick=5,
gridcolor='rgb(255, 255, 255)',
gridwidth=1,
zerolinecolor='rgb(255, 255, 255)',
zerolinewidth=2,
),
margin=dict(
l=40,
r=30,
b=80,
t=100,
),
height = 400,
width = 700,
paper_bgcolor='rgb(243, 243, 243)',
plot_bgcolor='rgb(243, 243, 243)',
showlegend=False
)
fig_5.update_xaxes(tickfont = dict(size = 8))
f5 = go.FigureWidget(fig_5)
x_dat = corr_dat['level_1'].unique()[50:60]
y_dat = [list(corr_dat[corr_dat['level_1']==i]['1yr_ret']) for i in x_dat]
fig_6 = go.Figure()
for x, y, cl in zip(x_dat, y_dat, colors[0:10]):
fig_6.add_trace(go.Box(
y = y,
name=x,
#boxpoints='all',
whiskerwidth= 0.2,
line_width=1,
fillcolor=cl,
boxmean = "sd"
))
fig_6.update_layout(
title='Correlation to 1 Year forward return (Features 40-50)',
yaxis=dict(
autorange=True,
showgrid=True,
zeroline=True,
dtick=5,
gridcolor='rgb(255, 255, 255)',
gridwidth=1,
zerolinecolor='rgb(255, 255, 255)',
zerolinewidth=2,
),
margin=dict(
l=40,
r=30,
b=80,
t=100,
),
height = 400,
width = 700,
paper_bgcolor='rgb(243, 243, 243)',
plot_bgcolor='rgb(243, 243, 243)',
showlegend=False
)
fig_6.update_xaxes(tickfont = dict(size = 8))
f6 = go.FigureWidget(fig_6)
x_dat = corr_dat['level_1'].unique()[60:]
y_dat = [list(corr_dat[corr_dat['level_1']==i]['1yr_ret']) for i in x_dat]
fig_7 = go.Figure()
for x, y, cl in zip(x_dat, y_dat, colors[0:9]):
fig_7.add_trace(go.Box(
y = y,
name=x,
#boxpoints='all',
whiskerwidth= 0.2,
line_width=1,
fillcolor=cl,
boxmean = "sd"
))
fig_7.update_layout(
title='Correlation to 1 Year forward return (Features 40-50)',
yaxis=dict(
autorange=True,
showgrid=True,
zeroline=True,
dtick=5,
gridcolor='rgb(255, 255, 255)',
gridwidth=1,
zerolinecolor='rgb(255, 255, 255)',
zerolinewidth=2,
),
margin=dict(
l=40,
r=30,
b=80,
t=100,
),
height = 400,
width = 700,
paper_bgcolor='rgb(243, 243, 243)',
plot_bgcolor='rgb(243, 243, 243)',
showlegend=False
)
fig_7.update_xaxes(tickfont = dict(size = 8))
f7 = go.FigureWidget(fig_7)
fig_subplot = VBox([HBox([f1,f2]),HBox([f3,f4]),HBox([f5,f6]),HBox([f7])])
fig_subplot
Effectively The data has forced us to make one of two conclusions:
OR
To eliminate outliers we chose to use the Isolation forest anomoly detection algorithm. The general idea behind the isolation forest algorithm is that when dealing with large amounts of data, it is easier to identify anomalies than it is to identify normal points. The algorithm effectivly works like a decision tree that randomly partitions data and seeks outliers based on a random split between max and min of the data. The logic here is that a random partition should usually occure closer to the root of the tree because outliers usually occur further away.
As previously discussed, the model's objective is to classify stocks int top 50% gainers and bottom 50% gainers. We decided on median because in addition to being a widely used measure of central tendancy, the use of a median ensures a fairly balanced training set regardless of the data's distribution. This will ensure that the model evaluates stocks on a relative basis. We believe this to be advantageous despite the potential missclassifications.
###Outlier detection and removal using isolation forests
training_block = pd.read_csv("assets/models/jeff_multi_factor/block_data1_2019.csv", index_col = 0)
training_block = training_block.drop(['3m_ret', '6m_ret', 'Instrument', 'date'], axis = 1)
training_block['label'] = [1 if i >= training_block['1yr_ret'].median() else 0 for i in training_block['1yr_ret']]
training_block = training_block.drop(['1yr_ret'], axis = 1)
training_block = training_block.replace([np.inf, -np.inf], np.nan)
training_block = training_block.dropna()
data = training_block.values
X, y = data[:, :-1], data[:, -1]
print(X.shape, y.shape)
iso = IsolationForest(contamination=0.2)
yhat = iso.fit_predict(X)
mask = yhat != -1
X, y = X[mask, :], y[mask]
print(X.shape, y.shape)
(1467, 69) (1467,) (1173, 69) (1173,)
Based on the isolation forest model used to detect outliers in our data, we have discarded about 22.5% of the original observations due to extreme observations. The remaining data must now be scaled and split into training and test sets. Due to the balance nature of the data, and the relatively small number of observations, we decided to split the data into 80% train and 20% test.
###Training_set Normalization
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,)
scalar = StandardScaler().fit(X_train)
scaled_X_train = scalar.transform(X_train)
scaled_X_test = scalar.transform(X_test)
We’ve used The Karas package to construct our model. We chose to build a sequential model with three hidden layers with multiple nodes, all of which use the 'relu' activation function, and an the output layer with a single node that uses the sigmoid activation function. We use the binary cross entropy loss function and the adam optimizer. We also include dropouts after every hidden layer to help minimize potential overfitting, and use l2 regularization to help with any effects we may experience as a result of multicoliniarity. Lastly we use the Keras RandomSearch tuner to identify the ideal number of nodes in each hidden layer, and the most effective learning rate for the optimizer. The tuner objective is set to minimize the loss fucntion results for the validation (test) set.
def build_mod(hp):
tf.random.set_seed(42)
hp_units1 = hp.Int('units1', min_value=8, max_value=16, step=1)
hp_units2 = hp.Int('units2', min_value=6, max_value=8, step=1)
hp_units3 = hp.Int('units3', min_value=2, max_value=6, step=1)
model = Sequential()
model.add(Dense(units = hp_units1, activation='relu', input_dim = 69, activity_regularizer = l2(0.002)))
model.add(Dropout(0.5))
model.add(Dense(units = hp_units2, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(units = hp_units3, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer= Adam(hp.Choice("learning_rate", values=[1e-2, 1e-3, 1e-4])), metrics=['binary_accuracy', 'Precision', 'Recall'])
return model
tuner = kt.RandomSearch(build_mod, objective = kt.Objective("val_loss",direction="min"), max_trials = 25, seed = 42, overwrite = True)
tuner.search(scaled_X_train, y_train, epochs = 150, batch_size=32, validation_data = (scaled_X_test, y_test))
Trial 25 Complete [00h 00m 08s] val_loss: 0.696549654006958 Best val_loss So Far: 0.6365458965301514 Total elapsed time: 00h 03m 05s INFO:tensorflow:Oracle triggered exit
tuner.results_summary(1)
Results summary Results in .\untitled_project Showing 1 best trials Objective(name='val_loss', direction='min') Trial summary Hyperparameters: units1: 15 units2: 8 units3: 4 learning_rate: 0.01 Score: 0.6365458965301514
Once the tuner has identified the best model we retrain the model using the identified parameters in order to evaluate the model's performance over all itterations, based on the following metrics:
###Retrain the model and record History for evaluation
tf.random.set_seed(42)
#es = EarlyStopping(monitor='val_loss', min_delta=0.001,patience=20, restore_best_weights=True)
model = Sequential()
model.add(Dense(15, activation='relu', input_dim = 69, activity_regularizer = l2(0.002)))
model.add(Dropout(0.5))
model.add(Dense(8, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate = 0.01), metrics=['binary_accuracy', 'Precision', 'Recall'])
history = model.fit(scaled_X_train, y_train, epochs=150, batch_size=32, validation_data = (scaled_X_test, y_test))
Epoch 1/150 30/30 [==============================] - 1s 8ms/step - loss: 0.7318 - binary_accuracy: 0.5128 - precision: 0.5327 - recall: 0.2420 - val_loss: 0.7153 - val_binary_accuracy: 0.5319 - val_precision: 0.8000 - val_recall: 0.0354 Epoch 2/150 30/30 [==============================] - 0s 1ms/step - loss: 0.7173 - binary_accuracy: 0.5032 - precision: 0.5281 - recall: 0.0998 - val_loss: 0.7071 - val_binary_accuracy: 0.5277 - val_precision: 0.6000 - val_recall: 0.0531 Epoch 3/150 30/30 [==============================] - 0s 1ms/step - loss: 0.7060 - binary_accuracy: 0.5043 - precision: 0.5069 - recall: 0.4692 - val_loss: 0.7050 - val_binary_accuracy: 0.4809 - val_precision: 0.4809 - val_recall: 1.0000 Epoch 4/150 30/30 [==============================] - 0s 1ms/step - loss: 0.7036 - binary_accuracy: 0.5011 - precision: 0.5090 - recall: 0.1805 - val_loss: 0.7004 - val_binary_accuracy: 0.5404 - val_precision: 0.7778 - val_recall: 0.0619 Epoch 5/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6970 - binary_accuracy: 0.5330 - precision: 0.6571 - recall: 0.1465 - val_loss: 0.6989 - val_binary_accuracy: 0.5404 - val_precision: 0.5556 - val_recall: 0.2212 Epoch 6/150 30/30 [==============================] - 0s 1ms/step - loss: 0.7012 - binary_accuracy: 0.4979 - precision: 0.5000 - recall: 0.4777 - val_loss: 0.6983 - val_binary_accuracy: 0.5234 - val_precision: 0.5088 - val_recall: 0.2566 Epoch 7/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6974 - binary_accuracy: 0.5139 - precision: 0.5773 - recall: 0.1189 - val_loss: 0.6996 - val_binary_accuracy: 0.5489 - val_precision: 0.7692 - val_recall: 0.0885 Epoch 8/150 30/30 [==============================] - 0s 1ms/step - loss: 0.7000 - binary_accuracy: 0.5021 - precision: 0.5303 - recall: 0.0743 - val_loss: 0.6971 - val_binary_accuracy: 0.5362 - val_precision: 0.6250 - val_recall: 0.0885 Epoch 9/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6922 - binary_accuracy: 0.5373 - precision: 0.6907 - recall: 0.1423 - val_loss: 0.6956 - val_binary_accuracy: 0.5489 - val_precision: 0.5714 - val_recall: 0.2478 Epoch 10/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6963 - binary_accuracy: 0.5043 - precision: 0.5064 - recall: 0.5053 - val_loss: 0.6973 - val_binary_accuracy: 0.5447 - val_precision: 0.5268 - val_recall: 0.5221 Epoch 11/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6973 - binary_accuracy: 0.5160 - precision: 0.5445 - recall: 0.2208 - val_loss: 0.6973 - val_binary_accuracy: 0.5660 - val_precision: 0.5932 - val_recall: 0.3097 Epoch 12/150 30/30 [==============================] - 0s 1ms/step - loss: 0.7007 - binary_accuracy: 0.5341 - precision: 0.5955 - recall: 0.2251 - val_loss: 0.6995 - val_binary_accuracy: 0.5191 - val_precision: 0.5000 - val_recall: 0.3540 Epoch 13/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6910 - binary_accuracy: 0.5416 - precision: 0.5627 - recall: 0.3907 - val_loss: 0.6977 - val_binary_accuracy: 0.5362 - val_precision: 0.5128 - val_recall: 0.7080 Epoch 14/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6908 - binary_accuracy: 0.5405 - precision: 0.5654 - recall: 0.3673 - val_loss: 0.6935 - val_binary_accuracy: 0.5532 - val_precision: 0.5625 - val_recall: 0.3186 Epoch 15/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6958 - binary_accuracy: 0.5544 - precision: 0.6497 - recall: 0.2442 - val_loss: 0.6961 - val_binary_accuracy: 0.5574 - val_precision: 0.6216 - val_recall: 0.2035 Epoch 16/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6916 - binary_accuracy: 0.5501 - precision: 0.6815 - recall: 0.1953 - val_loss: 0.7000 - val_binary_accuracy: 0.5404 - val_precision: 0.5490 - val_recall: 0.2478 Epoch 17/150 30/30 [==============================] - 0s 1ms/step - loss: 0.7057 - binary_accuracy: 0.5288 - precision: 0.5721 - recall: 0.2442 - val_loss: 0.6958 - val_binary_accuracy: 0.5489 - val_precision: 0.5507 - val_recall: 0.3363 Epoch 18/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6896 - binary_accuracy: 0.5640 - precision: 0.6890 - recall: 0.2399 - val_loss: 0.6914 - val_binary_accuracy: 0.5447 - val_precision: 0.5319 - val_recall: 0.4425 Epoch 19/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6812 - binary_accuracy: 0.5629 - precision: 0.5905 - recall: 0.4225 - val_loss: 0.6917 - val_binary_accuracy: 0.5404 - val_precision: 0.5410 - val_recall: 0.2920 Epoch 20/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6901 - binary_accuracy: 0.5586 - precision: 0.6770 - recall: 0.2314 - val_loss: 0.6963 - val_binary_accuracy: 0.5660 - val_precision: 0.5846 - val_recall: 0.3363 Epoch 21/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6833 - binary_accuracy: 0.5608 - precision: 0.6686 - recall: 0.2484 - val_loss: 0.6904 - val_binary_accuracy: 0.5745 - val_precision: 0.5730 - val_recall: 0.4513 Epoch 22/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6904 - binary_accuracy: 0.5533 - precision: 0.5586 - recall: 0.5265 - val_loss: 0.6867 - val_binary_accuracy: 0.5362 - val_precision: 0.5213 - val_recall: 0.4336 Epoch 23/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6887 - binary_accuracy: 0.5704 - precision: 0.6269 - recall: 0.3567 - val_loss: 0.6881 - val_binary_accuracy: 0.5362 - val_precision: 0.5167 - val_recall: 0.5487 Epoch 24/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6877 - binary_accuracy: 0.5618 - precision: 0.5612 - recall: 0.5839 - val_loss: 0.6907 - val_binary_accuracy: 0.5191 - val_precision: 0.5000 - val_recall: 0.3274 Epoch 25/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6782 - binary_accuracy: 0.5768 - precision: 0.5787 - recall: 0.5775 - val_loss: 0.6855 - val_binary_accuracy: 0.5532 - val_precision: 0.5476 - val_recall: 0.4071 Epoch 26/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6714 - binary_accuracy: 0.5949 - precision: 0.6201 - recall: 0.4989 - val_loss: 0.6854 - val_binary_accuracy: 0.5532 - val_precision: 0.5606 - val_recall: 0.3274 Epoch 27/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6963 - binary_accuracy: 0.5682 - precision: 0.5902 - recall: 0.4586 - val_loss: 0.6953 - val_binary_accuracy: 0.5319 - val_precision: 0.5149 - val_recall: 0.4602 Epoch 28/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6867 - binary_accuracy: 0.5618 - precision: 0.5633 - recall: 0.5669 - val_loss: 0.6904 - val_binary_accuracy: 0.5915 - val_precision: 0.6049 - val_recall: 0.4336 Epoch 29/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6825 - binary_accuracy: 0.5640 - precision: 0.5654 - recall: 0.5690 - val_loss: 0.6865 - val_binary_accuracy: 0.5404 - val_precision: 0.5333 - val_recall: 0.3540 Epoch 30/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6719 - binary_accuracy: 0.5864 - precision: 0.5856 - recall: 0.6030 - val_loss: 0.6806 - val_binary_accuracy: 0.5574 - val_precision: 0.5495 - val_recall: 0.4425 Epoch 31/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6711 - binary_accuracy: 0.5938 - precision: 0.5849 - recall: 0.6582 - val_loss: 0.6785 - val_binary_accuracy: 0.5702 - val_precision: 0.5682 - val_recall: 0.4425 Epoch 32/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6829 - binary_accuracy: 0.5789 - precision: 0.5754 - recall: 0.6157 - val_loss: 0.6907 - val_binary_accuracy: 0.5532 - val_precision: 0.5357 - val_recall: 0.5310 Epoch 33/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6812 - binary_accuracy: 0.5853 - precision: 0.5940 - recall: 0.5499 - val_loss: 0.6814 - val_binary_accuracy: 0.5745 - val_precision: 0.5783 - val_recall: 0.4248 Epoch 34/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6829 - binary_accuracy: 0.5704 - precision: 0.6318 - recall: 0.3461 - val_loss: 0.6807 - val_binary_accuracy: 0.5532 - val_precision: 0.5364 - val_recall: 0.5221 Epoch 35/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6669 - binary_accuracy: 0.5949 - precision: 0.5966 - recall: 0.5966 - val_loss: 0.6804 - val_binary_accuracy: 0.5872 - val_precision: 0.5909 - val_recall: 0.4602 Epoch 36/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6600 - binary_accuracy: 0.5896 - precision: 0.6536 - recall: 0.3885 - val_loss: 0.6780 - val_binary_accuracy: 0.5702 - val_precision: 0.5600 - val_recall: 0.4956 Epoch 37/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6700 - binary_accuracy: 0.5970 - precision: 0.5951 - recall: 0.6178 - val_loss: 0.6808 - val_binary_accuracy: 0.5787 - val_precision: 0.5583 - val_recall: 0.5929 Epoch 38/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6724 - binary_accuracy: 0.6066 - precision: 0.5944 - recall: 0.6815 - val_loss: 0.6849 - val_binary_accuracy: 0.5745 - val_precision: 0.5747 - val_recall: 0.4425 Epoch 39/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6803 - binary_accuracy: 0.5885 - precision: 0.5859 - recall: 0.6157 - val_loss: 0.6855 - val_binary_accuracy: 0.5745 - val_precision: 0.5596 - val_recall: 0.5398 Epoch 40/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6692 - binary_accuracy: 0.6045 - precision: 0.6008 - recall: 0.6327 - val_loss: 0.6803 - val_binary_accuracy: 0.6085 - val_precision: 0.6061 - val_recall: 0.5310 Epoch 41/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6593 - binary_accuracy: 0.5970 - precision: 0.6154 - recall: 0.5265 - val_loss: 0.6730 - val_binary_accuracy: 0.6043 - val_precision: 0.5943 - val_recall: 0.5575 Epoch 42/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6577 - binary_accuracy: 0.6119 - precision: 0.6348 - recall: 0.5350 - val_loss: 0.6688 - val_binary_accuracy: 0.5915 - val_precision: 0.5714 - val_recall: 0.6018 Epoch 43/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6724 - binary_accuracy: 0.5800 - precision: 0.5765 - recall: 0.6157 - val_loss: 0.6730 - val_binary_accuracy: 0.5872 - val_precision: 0.5714 - val_recall: 0.5664 Epoch 44/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6699 - binary_accuracy: 0.5959 - precision: 0.5970 - recall: 0.6008 - val_loss: 0.6734 - val_binary_accuracy: 0.5830 - val_precision: 0.5758 - val_recall: 0.5044 Epoch 45/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6607 - binary_accuracy: 0.6258 - precision: 0.6277 - recall: 0.6263 - val_loss: 0.6814 - val_binary_accuracy: 0.5617 - val_precision: 0.5472 - val_recall: 0.5133 Epoch 46/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6557 - binary_accuracy: 0.6055 - precision: 0.6050 - recall: 0.6178 - val_loss: 0.6713 - val_binary_accuracy: 0.6170 - val_precision: 0.5966 - val_recall: 0.6283 Epoch 47/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6434 - binary_accuracy: 0.6311 - precision: 0.6200 - recall: 0.6858 - val_loss: 0.6818 - val_binary_accuracy: 0.5532 - val_precision: 0.5294 - val_recall: 0.6372 Epoch 48/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6770 - binary_accuracy: 0.6098 - precision: 0.6052 - recall: 0.6412 - val_loss: 0.6828 - val_binary_accuracy: 0.5532 - val_precision: 0.5317 - val_recall: 0.5929 Epoch 49/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6664 - binary_accuracy: 0.6258 - precision: 0.6154 - recall: 0.6794 - val_loss: 0.6751 - val_binary_accuracy: 0.5660 - val_precision: 0.5470 - val_recall: 0.5664 Epoch 50/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6644 - binary_accuracy: 0.6109 - precision: 0.6043 - recall: 0.6518 - val_loss: 0.6784 - val_binary_accuracy: 0.5787 - val_precision: 0.5603 - val_recall: 0.5752 Epoch 51/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6587 - binary_accuracy: 0.6109 - precision: 0.6011 - recall: 0.6688 - val_loss: 0.6917 - val_binary_accuracy: 0.6043 - val_precision: 0.5980 - val_recall: 0.5398 Epoch 52/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6662 - binary_accuracy: 0.6290 - precision: 0.6253 - recall: 0.6518 - val_loss: 0.6782 - val_binary_accuracy: 0.5872 - val_precision: 0.5667 - val_recall: 0.6018 Epoch 53/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6602 - binary_accuracy: 0.6258 - precision: 0.6158 - recall: 0.6773 - val_loss: 0.6734 - val_binary_accuracy: 0.5574 - val_precision: 0.5446 - val_recall: 0.4867 Epoch 54/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6508 - binary_accuracy: 0.6119 - precision: 0.6055 - recall: 0.6518 - val_loss: 0.6792 - val_binary_accuracy: 0.6000 - val_precision: 0.5960 - val_recall: 0.5221 Epoch 55/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6413 - binary_accuracy: 0.6087 - precision: 0.6040 - recall: 0.6412 - val_loss: 0.6978 - val_binary_accuracy: 0.5787 - val_precision: 0.5515 - val_recall: 0.6637 Epoch 56/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6381 - binary_accuracy: 0.6141 - precision: 0.6143 - recall: 0.6221 - val_loss: 0.6706 - val_binary_accuracy: 0.6000 - val_precision: 0.5872 - val_recall: 0.5664 Epoch 57/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6567 - binary_accuracy: 0.6098 - precision: 0.6479 - recall: 0.4883 - val_loss: 0.6768 - val_binary_accuracy: 0.5872 - val_precision: 0.5851 - val_recall: 0.4867 Epoch 58/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6641 - binary_accuracy: 0.6215 - precision: 0.6343 - recall: 0.5817 - val_loss: 0.6709 - val_binary_accuracy: 0.5872 - val_precision: 0.5784 - val_recall: 0.5221 Epoch 59/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6379 - binary_accuracy: 0.6290 - precision: 0.6208 - recall: 0.6709 - val_loss: 0.7067 - val_binary_accuracy: 0.6085 - val_precision: 0.6000 - val_recall: 0.5575 Epoch 60/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6519 - binary_accuracy: 0.6279 - precision: 0.6235 - recall: 0.6539 - val_loss: 0.6701 - val_binary_accuracy: 0.6043 - val_precision: 0.6042 - val_recall: 0.5133 Epoch 61/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6424 - binary_accuracy: 0.6173 - precision: 0.6120 - recall: 0.6497 - val_loss: 0.6765 - val_binary_accuracy: 0.6255 - val_precision: 0.6238 - val_recall: 0.5575 Epoch 62/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6440 - binary_accuracy: 0.6130 - precision: 0.6011 - recall: 0.6815 - val_loss: 0.6736 - val_binary_accuracy: 0.5915 - val_precision: 0.5825 - val_recall: 0.5310 Epoch 63/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6615 - binary_accuracy: 0.6226 - precision: 0.6286 - recall: 0.6072 - val_loss: 0.6667 - val_binary_accuracy: 0.5957 - val_precision: 0.5978 - val_recall: 0.4867 Epoch 64/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6531 - binary_accuracy: 0.6279 - precision: 0.6309 - recall: 0.6242 - val_loss: 0.6596 - val_binary_accuracy: 0.6340 - val_precision: 0.6484 - val_recall: 0.5221 Epoch 65/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6541 - binary_accuracy: 0.6162 - precision: 0.6135 - recall: 0.6369 - val_loss: 0.6798 - val_binary_accuracy: 0.6213 - val_precision: 0.6176 - val_recall: 0.5575 Epoch 66/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6372 - binary_accuracy: 0.6439 - precision: 0.6436 - recall: 0.6518 - val_loss: 0.6738 - val_binary_accuracy: 0.6468 - val_precision: 0.6415 - val_recall: 0.6018 Epoch 67/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6348 - binary_accuracy: 0.6322 - precision: 0.6302 - recall: 0.6476 - val_loss: 0.6654 - val_binary_accuracy: 0.6170 - val_precision: 0.6139 - val_recall: 0.5487 Epoch 68/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6478 - binary_accuracy: 0.6151 - precision: 0.6087 - recall: 0.6539 - val_loss: 0.6795 - val_binary_accuracy: 0.6340 - val_precision: 0.6154 - val_recall: 0.6372 Epoch 69/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6381 - binary_accuracy: 0.6013 - precision: 0.5823 - recall: 0.7282 - val_loss: 0.6650 - val_binary_accuracy: 0.6468 - val_precision: 0.6389 - val_recall: 0.6106 Epoch 70/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6398 - binary_accuracy: 0.6343 - precision: 0.6693 - recall: 0.5372 - val_loss: 0.6723 - val_binary_accuracy: 0.6128 - val_precision: 0.5948 - val_recall: 0.6106 Epoch 71/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6443 - binary_accuracy: 0.6215 - precision: 0.6648 - recall: 0.4968 - val_loss: 0.6517 - val_binary_accuracy: 0.6128 - val_precision: 0.6196 - val_recall: 0.5044 Epoch 72/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6335 - binary_accuracy: 0.6087 - precision: 0.6102 - recall: 0.6115 - val_loss: 0.6509 - val_binary_accuracy: 0.6085 - val_precision: 0.6180 - val_recall: 0.4867 Epoch 73/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6326 - binary_accuracy: 0.6279 - precision: 0.6160 - recall: 0.6879 - val_loss: 0.6649 - val_binary_accuracy: 0.6298 - val_precision: 0.6275 - val_recall: 0.5664 Epoch 74/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6594 - binary_accuracy: 0.6045 - precision: 0.5954 - recall: 0.6624 - val_loss: 0.6542 - val_binary_accuracy: 0.6426 - val_precision: 0.6261 - val_recall: 0.6372 Epoch 75/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6323 - binary_accuracy: 0.6439 - precision: 0.6295 - recall: 0.7070 - val_loss: 0.6674 - val_binary_accuracy: 0.6383 - val_precision: 0.6429 - val_recall: 0.5575 Epoch 76/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6342 - binary_accuracy: 0.6450 - precision: 0.7197 - recall: 0.4798 - val_loss: 0.6683 - val_binary_accuracy: 0.6085 - val_precision: 0.6207 - val_recall: 0.4779 Epoch 77/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6388 - binary_accuracy: 0.6194 - precision: 0.6583 - recall: 0.5032 - val_loss: 0.6651 - val_binary_accuracy: 0.6043 - val_precision: 0.6042 - val_recall: 0.5133 Epoch 78/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6359 - binary_accuracy: 0.6450 - precision: 0.6667 - recall: 0.5860 - val_loss: 0.6649 - val_binary_accuracy: 0.6383 - val_precision: 0.6522 - val_recall: 0.5310 Epoch 79/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6430 - binary_accuracy: 0.6322 - precision: 0.6250 - recall: 0.6688 - val_loss: 0.6680 - val_binary_accuracy: 0.6426 - val_precision: 0.6706 - val_recall: 0.5044 Epoch 80/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6356 - binary_accuracy: 0.6588 - precision: 0.6525 - recall: 0.6858 - val_loss: 0.6866 - val_binary_accuracy: 0.6085 - val_precision: 0.6235 - val_recall: 0.4690 Epoch 81/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6442 - binary_accuracy: 0.6162 - precision: 0.6183 - recall: 0.6157 - val_loss: 0.6745 - val_binary_accuracy: 0.6000 - val_precision: 0.6118 - val_recall: 0.4602 Epoch 82/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6351 - binary_accuracy: 0.6247 - precision: 0.6207 - recall: 0.6497 - val_loss: 0.6720 - val_binary_accuracy: 0.6213 - val_precision: 0.6200 - val_recall: 0.5487 Epoch 83/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6555 - binary_accuracy: 0.6205 - precision: 0.6676 - recall: 0.4862 - val_loss: 0.6668 - val_binary_accuracy: 0.6128 - val_precision: 0.6019 - val_recall: 0.5752 Epoch 84/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6293 - binary_accuracy: 0.6301 - precision: 0.6483 - recall: 0.5754 - val_loss: 0.6738 - val_binary_accuracy: 0.5830 - val_precision: 0.5641 - val_recall: 0.5841 Epoch 85/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6195 - binary_accuracy: 0.6269 - precision: 0.6613 - recall: 0.5265 - val_loss: 0.6626 - val_binary_accuracy: 0.6255 - val_precision: 0.6344 - val_recall: 0.5221 Epoch 86/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6398 - binary_accuracy: 0.6045 - precision: 0.6562 - recall: 0.4459 - val_loss: 0.6493 - val_binary_accuracy: 0.6383 - val_precision: 0.6429 - val_recall: 0.5575 Epoch 87/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6409 - binary_accuracy: 0.6429 - precision: 0.7615 - recall: 0.4204 - val_loss: 0.6614 - val_binary_accuracy: 0.6170 - val_precision: 0.6264 - val_recall: 0.5044 Epoch 88/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6260 - binary_accuracy: 0.6450 - precision: 0.7782 - recall: 0.4098 - val_loss: 0.6799 - val_binary_accuracy: 0.6000 - val_precision: 0.5826 - val_recall: 0.5929 Epoch 89/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6460 - binary_accuracy: 0.6215 - precision: 0.7000 - recall: 0.4310 - val_loss: 0.6574 - val_binary_accuracy: 0.6255 - val_precision: 0.6316 - val_recall: 0.5310 Epoch 90/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6278 - binary_accuracy: 0.6322 - precision: 0.7114 - recall: 0.4501 - val_loss: 0.6662 - val_binary_accuracy: 0.5957 - val_precision: 0.5978 - val_recall: 0.4867 Epoch 91/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6296 - binary_accuracy: 0.6535 - precision: 0.6952 - recall: 0.5520 - val_loss: 0.6457 - val_binary_accuracy: 0.6426 - val_precision: 0.6408 - val_recall: 0.5841 Epoch 92/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6285 - binary_accuracy: 0.6322 - precision: 0.6221 - recall: 0.6815 - val_loss: 0.6416 - val_binary_accuracy: 0.6255 - val_precision: 0.6437 - val_recall: 0.4956 Epoch 93/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6250 - binary_accuracy: 0.6418 - precision: 0.6477 - recall: 0.6285 - val_loss: 0.6577 - val_binary_accuracy: 0.6128 - val_precision: 0.6250 - val_recall: 0.4867 Epoch 94/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6520 - binary_accuracy: 0.5842 - precision: 0.5697 - recall: 0.7028 - val_loss: 0.6589 - val_binary_accuracy: 0.6000 - val_precision: 0.6000 - val_recall: 0.5044 Epoch 95/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6448 - binary_accuracy: 0.6237 - precision: 0.6266 - recall: 0.6200 - val_loss: 0.6497 - val_binary_accuracy: 0.6426 - val_precision: 0.6465 - val_recall: 0.5664 Epoch 96/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6243 - binary_accuracy: 0.6365 - precision: 0.6354 - recall: 0.6476 - val_loss: 0.6642 - val_binary_accuracy: 0.6128 - val_precision: 0.6000 - val_recall: 0.5841 Epoch 97/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6210 - binary_accuracy: 0.6386 - precision: 0.6626 - recall: 0.5711 - val_loss: 0.6766 - val_binary_accuracy: 0.6000 - val_precision: 0.5872 - val_recall: 0.5664 Epoch 98/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6301 - binary_accuracy: 0.6322 - precision: 0.6270 - recall: 0.6603 - val_loss: 0.6507 - val_binary_accuracy: 0.6255 - val_precision: 0.6190 - val_recall: 0.5752 Epoch 99/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6363 - binary_accuracy: 0.6173 - precision: 0.6022 - recall: 0.7006 - val_loss: 0.6502 - val_binary_accuracy: 0.6340 - val_precision: 0.5871 - val_recall: 0.8053 Epoch 100/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6331 - binary_accuracy: 0.6386 - precision: 0.6130 - recall: 0.7601 - val_loss: 0.6433 - val_binary_accuracy: 0.6085 - val_precision: 0.5963 - val_recall: 0.5752 Epoch 101/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6287 - binary_accuracy: 0.6247 - precision: 0.6174 - recall: 0.6645 - val_loss: 0.6365 - val_binary_accuracy: 0.6383 - val_precision: 0.6228 - val_recall: 0.6283 Epoch 102/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6344 - binary_accuracy: 0.6301 - precision: 0.6188 - recall: 0.6858 - val_loss: 0.6492 - val_binary_accuracy: 0.6383 - val_precision: 0.6400 - val_recall: 0.5664 Epoch 103/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6318 - binary_accuracy: 0.6429 - precision: 0.7576 - recall: 0.4246 - val_loss: 0.6479 - val_binary_accuracy: 0.6426 - val_precision: 0.6526 - val_recall: 0.5487 Epoch 104/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6427 - binary_accuracy: 0.6226 - precision: 0.7368 - recall: 0.3864 - val_loss: 0.6511 - val_binary_accuracy: 0.6213 - val_precision: 0.6071 - val_recall: 0.6018 Epoch 105/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6111 - binary_accuracy: 0.6183 - precision: 0.6491 - recall: 0.5223 - val_loss: 0.6460 - val_binary_accuracy: 0.6298 - val_precision: 0.6161 - val_recall: 0.6106 Epoch 106/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6254 - binary_accuracy: 0.6365 - precision: 0.6923 - recall: 0.4968 - val_loss: 0.6653 - val_binary_accuracy: 0.5872 - val_precision: 0.5727 - val_recall: 0.5575 Epoch 107/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6205 - binary_accuracy: 0.6087 - precision: 0.6538 - recall: 0.4692 - val_loss: 0.6669 - val_binary_accuracy: 0.6128 - val_precision: 0.5932 - val_recall: 0.6195 Epoch 108/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6483 - binary_accuracy: 0.6375 - precision: 0.7298 - recall: 0.4416 - val_loss: 0.6599 - val_binary_accuracy: 0.6085 - val_precision: 0.6082 - val_recall: 0.5221 Epoch 109/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6328 - binary_accuracy: 0.6301 - precision: 0.6845 - recall: 0.4883 - val_loss: 0.6538 - val_binary_accuracy: 0.6000 - val_precision: 0.5941 - val_recall: 0.5310 Epoch 110/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6246 - binary_accuracy: 0.6461 - precision: 0.6371 - recall: 0.6858 - val_loss: 0.6501 - val_binary_accuracy: 0.6170 - val_precision: 0.6237 - val_recall: 0.5133 Epoch 111/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6243 - binary_accuracy: 0.6375 - precision: 0.6334 - recall: 0.6603 - val_loss: 0.6730 - val_binary_accuracy: 0.6170 - val_precision: 0.6237 - val_recall: 0.5133 Epoch 112/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6359 - binary_accuracy: 0.6077 - precision: 0.6075 - recall: 0.6178 - val_loss: 0.6561 - val_binary_accuracy: 0.6170 - val_precision: 0.6186 - val_recall: 0.5310 Epoch 113/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6245 - binary_accuracy: 0.6354 - precision: 0.6122 - recall: 0.7473 - val_loss: 0.6386 - val_binary_accuracy: 0.6426 - val_precision: 0.6306 - val_recall: 0.6195 Epoch 114/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6218 - binary_accuracy: 0.6354 - precision: 0.6387 - recall: 0.6306 - val_loss: 0.6497 - val_binary_accuracy: 0.6340 - val_precision: 0.6286 - val_recall: 0.5841 Epoch 115/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6290 - binary_accuracy: 0.6162 - precision: 0.6026 - recall: 0.6921 - val_loss: 0.6612 - val_binary_accuracy: 0.6085 - val_precision: 0.5827 - val_recall: 0.6549 Epoch 116/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6342 - binary_accuracy: 0.6461 - precision: 0.7014 - recall: 0.5138 - val_loss: 0.6471 - val_binary_accuracy: 0.6000 - val_precision: 0.6145 - val_recall: 0.4513 Epoch 117/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6168 - binary_accuracy: 0.6642 - precision: 0.8000 - recall: 0.4416 - val_loss: 0.6913 - val_binary_accuracy: 0.6170 - val_precision: 0.5878 - val_recall: 0.6814 Epoch 118/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6459 - binary_accuracy: 0.6503 - precision: 0.7270 - recall: 0.4862 - val_loss: 0.6567 - val_binary_accuracy: 0.6085 - val_precision: 0.5778 - val_recall: 0.6903 Epoch 119/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6361 - binary_accuracy: 0.6109 - precision: 0.6879 - recall: 0.4119 - val_loss: 0.6582 - val_binary_accuracy: 0.6000 - val_precision: 0.5812 - val_recall: 0.6018 Epoch 120/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6171 - binary_accuracy: 0.6450 - precision: 0.7464 - recall: 0.4437 - val_loss: 0.6549 - val_binary_accuracy: 0.6340 - val_precision: 0.6262 - val_recall: 0.5929 Epoch 121/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6134 - binary_accuracy: 0.6588 - precision: 0.7412 - recall: 0.4926 - val_loss: 0.6534 - val_binary_accuracy: 0.6213 - val_precision: 0.6200 - val_recall: 0.5487 Epoch 122/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6315 - binary_accuracy: 0.6439 - precision: 0.7625 - recall: 0.4225 - val_loss: 0.6567 - val_binary_accuracy: 0.6340 - val_precision: 0.6262 - val_recall: 0.5929 Epoch 123/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6207 - binary_accuracy: 0.6461 - precision: 0.6534 - recall: 0.6285 - val_loss: 0.6431 - val_binary_accuracy: 0.6511 - val_precision: 0.6505 - val_recall: 0.5929 Epoch 124/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6017 - binary_accuracy: 0.6482 - precision: 0.6466 - recall: 0.6603 - val_loss: 0.6585 - val_binary_accuracy: 0.6426 - val_precision: 0.6381 - val_recall: 0.5929 Epoch 125/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6213 - binary_accuracy: 0.6365 - precision: 0.6366 - recall: 0.6433 - val_loss: 0.6615 - val_binary_accuracy: 0.6511 - val_precision: 0.6632 - val_recall: 0.5575 Epoch 126/150 30/30 [==============================] - 0s 1ms/step - loss: 0.5942 - binary_accuracy: 0.6557 - precision: 0.7803 - recall: 0.4374 - val_loss: 0.6845 - val_binary_accuracy: 0.6128 - val_precision: 0.6146 - val_recall: 0.5221 Epoch 127/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6006 - binary_accuracy: 0.6482 - precision: 0.6430 - recall: 0.6730 - val_loss: 0.7246 - val_binary_accuracy: 0.6213 - val_precision: 0.6224 - val_recall: 0.5398 Epoch 128/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6446 - binary_accuracy: 0.6588 - precision: 0.6645 - recall: 0.6476 - val_loss: 0.6589 - val_binary_accuracy: 0.6085 - val_precision: 0.6296 - val_recall: 0.4513 Epoch 129/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6130 - binary_accuracy: 0.6503 - precision: 0.6585 - recall: 0.6306 - val_loss: 0.6417 - val_binary_accuracy: 0.6255 - val_precision: 0.6582 - val_recall: 0.4602 Epoch 130/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6083 - binary_accuracy: 0.6546 - precision: 0.6561 - recall: 0.6561 - val_loss: 0.6736 - val_binary_accuracy: 0.6340 - val_precision: 0.6392 - val_recall: 0.5487 Epoch 131/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6164 - binary_accuracy: 0.6716 - precision: 0.6745 - recall: 0.6688 - val_loss: 0.6718 - val_binary_accuracy: 0.6043 - val_precision: 0.5909 - val_recall: 0.5752 Epoch 132/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6176 - binary_accuracy: 0.6493 - precision: 0.6543 - recall: 0.6391 - val_loss: 0.6558 - val_binary_accuracy: 0.6170 - val_precision: 0.5966 - val_recall: 0.6283 Epoch 133/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6116 - binary_accuracy: 0.6461 - precision: 0.6659 - recall: 0.5924 - val_loss: 0.6633 - val_binary_accuracy: 0.6255 - val_precision: 0.6087 - val_recall: 0.6195 Epoch 134/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6116 - binary_accuracy: 0.6482 - precision: 0.6380 - recall: 0.6921 - val_loss: 0.6836 - val_binary_accuracy: 0.6596 - val_precision: 0.6542 - val_recall: 0.6195 Epoch 135/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6194 - binary_accuracy: 0.6429 - precision: 0.6435 - recall: 0.6476 - val_loss: 0.6985 - val_binary_accuracy: 0.6170 - val_precision: 0.6095 - val_recall: 0.5664 Epoch 136/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6477 - binary_accuracy: 0.6258 - precision: 0.6167 - recall: 0.6730 - val_loss: 0.6692 - val_binary_accuracy: 0.6213 - val_precision: 0.6071 - val_recall: 0.6018 Epoch 137/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6025 - binary_accuracy: 0.6855 - precision: 0.6739 - recall: 0.7240 - val_loss: 0.7040 - val_binary_accuracy: 0.5915 - val_precision: 0.5726 - val_recall: 0.5929 Epoch 138/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6440 - binary_accuracy: 0.6407 - precision: 0.6340 - recall: 0.6730 - val_loss: 0.6721 - val_binary_accuracy: 0.6298 - val_precision: 0.6182 - val_recall: 0.6018 Epoch 139/150 30/30 [==============================] - 0s 1ms/step - loss: 0.5920 - binary_accuracy: 0.6663 - precision: 0.6779 - recall: 0.6391 - val_loss: 0.6759 - val_binary_accuracy: 0.6383 - val_precision: 0.6400 - val_recall: 0.5664 Epoch 140/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6229 - binary_accuracy: 0.6407 - precision: 0.6667 - recall: 0.5690 - val_loss: 0.6848 - val_binary_accuracy: 0.6000 - val_precision: 0.5922 - val_recall: 0.5398 Epoch 141/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6206 - binary_accuracy: 0.6610 - precision: 0.6597 - recall: 0.6709 - val_loss: 0.6595 - val_binary_accuracy: 0.6298 - val_precision: 0.6275 - val_recall: 0.5664 Epoch 142/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6139 - binary_accuracy: 0.6588 - precision: 0.6617 - recall: 0.6561 - val_loss: 0.6842 - val_binary_accuracy: 0.6085 - val_precision: 0.6105 - val_recall: 0.5133 Epoch 143/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6205 - binary_accuracy: 0.6493 - precision: 0.6543 - recall: 0.6391 - val_loss: 0.7470 - val_binary_accuracy: 0.5915 - val_precision: 0.5794 - val_recall: 0.5487 Epoch 144/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6118 - binary_accuracy: 0.6557 - precision: 0.6555 - recall: 0.6624 - val_loss: 0.6572 - val_binary_accuracy: 0.6000 - val_precision: 0.5922 - val_recall: 0.5398 Epoch 145/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6034 - binary_accuracy: 0.6642 - precision: 0.6667 - recall: 0.6624 - val_loss: 0.6971 - val_binary_accuracy: 0.6213 - val_precision: 0.6132 - val_recall: 0.5752 Epoch 146/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6141 - binary_accuracy: 0.6386 - precision: 0.6429 - recall: 0.6306 - val_loss: 0.6506 - val_binary_accuracy: 0.6426 - val_precision: 0.6495 - val_recall: 0.5575 Epoch 147/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6201 - binary_accuracy: 0.6247 - precision: 0.6227 - recall: 0.6412 - val_loss: 0.6562 - val_binary_accuracy: 0.6553 - val_precision: 0.6538 - val_recall: 0.6018 Epoch 148/150 30/30 [==============================] - 0s 1ms/step - loss: 0.5872 - binary_accuracy: 0.6780 - precision: 0.6865 - recall: 0.6603 - val_loss: 0.6838 - val_binary_accuracy: 0.6298 - val_precision: 0.6161 - val_recall: 0.6106 Epoch 149/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6350 - binary_accuracy: 0.6258 - precision: 0.6195 - recall: 0.6603 - val_loss: 0.6685 - val_binary_accuracy: 0.6128 - val_precision: 0.6058 - val_recall: 0.5575 Epoch 150/150 30/30 [==============================] - 0s 1ms/step - loss: 0.6135 - binary_accuracy: 0.6525 - precision: 0.6520 - recall: 0.6603 - val_loss: 0.6837 - val_binary_accuracy: 0.6255 - val_precision: 0.6214 - val_recall: 0.5664
Both the training and testing loss results decline steadily over the first 125 epochs. At this point the validations set begins to level off, and the model can no longer improve on the loss. By epoch 150 the model appears to be slightly overfitting (0.61 train vs 0.68 val) but this doesnt seem to be extreme or impact the other metrics at the moment. Binary accuracy steadily climbs for both the training and validation sets through out the model, though the gradual climb begins to tapper off around 0.62. While this may feel like a low number, context is important. If an algorithm that can succesfully pick the top 50% of stocks with 90% accuracy existed, everyone would adopt it, and as people continued to adopt it, its accuracy and predictive power would decline. As such, the 62 - 65 accuracy range appears reasonable. The precision data is an indication that our model is doing a reasonable job avoiding bad stocks. The weakest number is in fact the recall, which shows that the model is having difficulty isolating good stocks, i.e higher false negatives.
color_list = ['#6929c4', '#1192e8', '#005d5d', '#9f1853', '#fa4d56', '#570408', '#198038', '#002d9c', '#ee538b', '#b28600', '#009d9a', '#012749']
fig = make_subplots(rows=2, cols=2,shared_yaxes = False, vertical_spacing = 0.1,
subplot_titles = ('loss', 'binary_accuracy', 'precision', 'recall' ))
fig.add_trace(go.Scatter(x = [i for i in range(1, len(history.history['loss'])+1)], y = history.history['loss'], name = 'train', line = {'color':color_list[0]}), row = 1, col = 1)
fig.add_trace(go.Scatter(x = [i for i in range(1, len(history.history['val_loss'])+1)], y = history.history['val_loss'], name = 'val', line = {'color':color_list[1]}), row = 1, col = 1)
fig.add_trace(go.Scatter(x = [i for i in range(1, len(history.history['binary_accuracy'])+1)], y = history.history['binary_accuracy'], name = 'train', line = {'color':color_list[2]}), row = 1, col = 2)
fig.add_trace(go.Scatter(x = [i for i in range(1, len(history.history['binary_accuracy'])+1)], y = history.history['val_binary_accuracy'], name = 'val', line = {'color':color_list[3]}), row = 1, col = 2)
fig.add_trace(go.Scatter(x = [i for i in range(1, len(history.history['precision'])+1)], y = history.history['precision'], name = 'train', line = {'color':color_list[4]}), row = 2, col = 1)
fig.add_trace(go.Scatter(x = [i for i in range(1, len(history.history['val_precision'])+1)], y = history.history['val_precision'], name = 'val', line = {'color':color_list[5]}), row = 2, col = 1)
fig.add_trace(go.Scatter(x = [i for i in range(1, len(history.history['recall'])+1)], y = history.history['recall'], name = 'train', line = {'color':color_list[6]}), row = 2, col = 2)
fig.add_trace(go.Scatter(x = [i for i in range(1, len(history.history['recall'])+1)], y = history.history['val_recall'], name = 'val', line = {'color':color_list[7]}), row = 2, col = 2)
#fig.update_yaxes(range=[0.5,1], row = 1, col = 1)
fig.update_layout(
paper_bgcolor='white',
plot_bgcolor= 'white',
margin=dict(l=48, r=48, t=100, b=10),
height = 700,
width = 900,
showlegend=False,
title = dict(text = 'ANN Multifactor Model Results', x = 0.5, y = 0.97))
fig.update_layout(hovermode="x unified")
### the model history is stored in a csv file for later use.
ml_stats = pd.DataFrame().from_dict(history.history)
ml_stats.to_csv('assets/models/jeff_multi_factor/ml_stats/mf_mlp.csv', index = False)
clear_session()
We are now ready to put our model to use in a simulated environment. To do this, we first generate a pipeline that takes in raw data, creates features that are free of outliers and properly scaled, as well as a label based on the expected one year return for each stock at each point in time, this data is then used to train our model, which in turn makes a classification for all stocks based on the next days data. Simply put, we are using 3 months of data to train our model, and making one prediction based of that data. That data is then discarded, and a fresh set of data is used to train the next itteration
Regime change in the context of the stock market is when "something changes" as the market no longer behaves the way it had previously. It is precisly the issue of regime change that makes stock selection particularly difficult to do succesfully over long periods of time. Regime change is always obvious in hindsite, and almost never visible in advance. Clearly, any model that searches for patterns in data can have a particularly difficult time classifying properly once stocks begin to behave differently than they had previously. To deal with this, our model is constructed to discard old data once its used, and never re-introduce it into the models. This is also the reason this model is structured to "trade" quarterly.
###function imports data to be used for prediction after the model is trained
def import_trade_data(trade_date):
"""
:param trade_date: a string must be in YYYY-MM-DD format
function calls raw data for all stocks on a specific date from SQL database
"""
conn = connect(dbname = '697_temp', user = 'postgres', host = '********', password = '********')
cur = conn.cursor()
query = "SELECT * FROM raw_data WHERE date = '{}'".format(trade_date)
trade_date = pd.to_datetime(trade_date)
dat = pd.read_sql_query(query,conn)
return dat
def import_px_data(trade_date):
"""
:param trade_date: a string must be in YYYY-MM-DD format
function calls price data for all stocks on a specific date from SQL database
"""
conn = connect(dbname = '697_temp', user = 'postgres', host = '********', password = '********')
cur = conn.cursor()
query = "SELECT * FROM px_dat WHERE index = '{}'".format(trade_date)
trade_date = pd.to_datetime(trade_date)
dat = pd.read_sql_query(query,conn)
return dat
###pipeline
def run_model(trade_date):
"""
:param trade_date: a string must be in YYYY-MM-DD format
this function runs our model pipline from start to finish
returns a df of classifications for each stock
"""
seed(42)
tf.random.set_seed(42)
q = pd.to_datetime(trade_date).quarter
yr = pd.to_datetime(trade_date).year - 1
training_block = pd.read_csv("assets/models/jeff_multi_factor/block_data{}_{}.csv".format(q, yr), index_col = 0)
training_block = training_block.drop(['3m_ret', '6m_ret', 'Instrument', 'date'], axis = 1)
training_block['label'] = [1 if i >= training_block['1yr_ret'].median() else 0 for i in training_block['1yr_ret']]
training_block = training_block.drop(['1yr_ret'], axis = 1)
training_block = training_block.replace([np.inf, -np.inf], np.nan)
training_block = training_block.dropna()
data = training_block.values
X, y = data[:, :-1], data[:, -1]
print('No of observations pre outlier removal: {}'.format( X.shape[0]))
pre = X.shape[0]
iso = IsolationForest(contamination=0.2)
yhat = iso.fit_predict(X)
mask = yhat != -1
X, y = X[mask, :], y[mask]
print('No of observations after outlier removal: {}'.format(X.shape[0]))
percent_kept = X.shape[0]/pre
print('{:.2%} of the data was kept'.format(percent_kept))
scalar = StandardScaler().fit(X)
scaled_X = scalar.transform(X)
model = Sequential()
model.add(Dense(15, activation='relu', input_dim = 69, activity_regularizer = l2(0.002)))
model.add(Dropout(0.5))
model.add(Dense(8, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=Adam(learning_rate = 0.01), metrics=['binary_accuracy', 'Precision', 'Recall'])
model.fit(scaled_X, y, epochs=148, batch_size=32, verbose = 0)
score = model.evaluate(scaled_X, y, verbose = 0)
print('Model loss:', score[0])
print('Model accuracy:', score[1])
print('Model Precission:', score[2])
print('Model Recall:', score[3])
input_raw = import_trade_data(trade_date).dropna(subset = ['12m_volume'])
results = list(input_raw['1yr_ret'])
stocks = list(input_raw['Instrument'])
input_raw = input_raw.replace([np.inf, -np.inf], np.nan)
input_raw = input_raw.drop(['3m_ret', '6m_ret', 'Instrument', 'date', '1yr_ret'], axis = 1)
mkt_data = input_raw .values
scaled_mkt = scalar.transform(mkt_data)
y_prob = model.predict_proba(scaled_mkt)
y_class = model.predict_classes(scaled_mkt)
output_df = input_raw.copy()
output_df['Instrument'] = stocks
output_df['class_pred'] = y_class
output_df['probab'] = y_prob
try:
output_df['True'] = [1 if i >= np.median(results) else 0 for i in results]
except:
output_df['True'] = ['unkonwon' for i in range(0, len(results))]
clear_session()
return output_df
def construct_portfolio(output_df, trade_date):
"""
:param output_df: a df produced by the model pipeline function
:param trade_date: a string must be in YYYY-MM-DD format
returns a df of stocks used to build the portfolio
"""
stocks = output_df.filter(['3m_avg_mkt_cap','Instrument', 'class_pred', 'probab'])
stocks = stocks[stocks['class_pred']==1].sort_values('probab', ascending = False).reset_index().drop('index', axis = 1)[0:50]
stocks = stocks.sort_values('3m_avg_mkt_cap', ascending = False)
stocks['weights'] = stocks["3m_avg_mkt_cap"]/stocks["3m_avg_mkt_cap"].sum()
stocks = stocks[stocks['weights']>=0.02]
stocks['weights'] = stocks['weights']/stocks['weights'] .sum()
px_dat = import_px_data(trade_date).drop('index', axis = 1)
stocks = stocks.merge(px_dat, on = 'Instrument' , how = 'inner')
return stocks
def backtest_prep():
"""
loops through trade dates, runs the model pipeline, and generates a list of trade to place based on model results
"""
dates = ['2019-10-01', '2020-01-02', '2020-04-01', '2020-07-01', '2020-10-01', '2021-01-04', '2021-04-01']
transaction_df = pd.DataFrame()
for d in dates:
stocks = run_model(d)
port = construct_portfolio(stocks,d)
port['Date'] = d
to_df = port.filter(['Date','Instrument', 'weights', 'close'])
transaction_df = pd.concat([transaction_df, to_df])
transaction_df.to_csv('assets/models/jeff_multi_factor/bt_transactions.csv',index = False)
return
backtest_prep()
No of observations pre outlier removal: 489 No of observations after outlier removal: 391 79.96% of the data was kept Model loss: 0.45085418224334717 Model accuracy: 0.7851662635803223 Model Precission: 0.7137546539306641 Model Recall: 0.9648241400718689 No of observations pre outlier removal: 1467 No of observations after outlier removal: 1173 79.96% of the data was kept Model loss: 0.5391020774841309 Model accuracy: 0.7706735134124756 Model Precission: 0.8387096524238586 Model Recall: 0.6678082346916199 No of observations pre outlier removal: 1467 No of observations after outlier removal: 1173 79.96% of the data was kept Model loss: 0.4786199629306793 Model accuracy: 0.7962489128112793 Model Precission: 0.8758314847946167 Model Recall: 0.6833909749984741 No of observations pre outlier removal: 1469 No of observations after outlier removal: 1175 79.99% of the data was kept Model loss: 0.49982428550720215 Model accuracy: 0.7651063799858093 Model Precission: 0.9522387981414795 Model Recall: 0.5509499311447144 No of observations pre outlier removal: 1477 No of observations after outlier removal: 1181 79.96% of the data was kept Model loss: 0.44042694568634033 Model accuracy: 0.8264182806015015 Model Precission: 0.8534798622131348 Model Recall: 0.7884940505027771 No of observations pre outlier removal: 1476 No of observations after outlier removal: 1181 80.01% of the data was kept Model loss: 0.4339868724346161 Model accuracy: 0.8323454856872559 Model Precission: 0.8336314558982849 Model Recall: 0.8161121010780334 No of observations pre outlier removal: 1477 No of observations after outlier removal: 1181 79.96% of the data was kept Model loss: 0.3425152599811554 Model accuracy: 0.8755292296409607 Model Precission: 0.9163498282432556 Model Recall: 0.8239316344261169
#test.reset_account()
#establish a portfolio using the port_2 class developed for this project
test = prt.portfolio(start_date='2019-10-01', value=1000000, end_date='2021-07-02')
The trading engine is one of the more difficult parts of this project. It requires that we simulate both a market, and a financial services firm at the same time. This is done by taking predetrmined transactions for the list generated above, executing them using historical prices, and updating performance between dates. The process is also designed to ensure that enough funds exist for transactions, that additional lots are properly accounted for, that performance is continuosly calculated, and positions are known at all times, and no short selling is allowd. Though there are similarities accross models, each model has its own trading engine based on the guidlines established for the model.
transactions = pd.read_csv("assets/models/jeff_multi_factor/bt_transactions.csv")
transactions['Date'] = pd.to_datetime(transactions['Date'])
trade_dates = transactions['Date'].unique()
nyse = mcal.get_calendar('NYSE')
dates = nyse.schedule(start_date='2019-10-01', end_date='2021-07-02')['market_close'].index
date_prep = pd.DataFrame({'Dates':dates})
date_prep['Quarter'] = date_prep['Dates'].dt.quarter
date_prep['Year'] = date_prep['Dates'].dt.year
dates = list(date_prep.groupby(['Year', 'Quarter']).min()['Dates'])
dates = [i.strftime('%Y-%m-%d') for i in dates]
for d in dates[0:-1]:
port = test.open_positions_df
px_exist = [test.get_price(d,i) for i in list(port['Ticker'])]
port['Last'] = px_exist
port["Current Value"] = port['Last'] * port['Quantity']
new_port = transactions[transactions['Date'] == d]
px_new = [test.get_price(d,i) for i in list(new_port['Instrument'])]
new_port['close'] = px_new
new_port['shares'] = ((new_port['weights'] * (port['Current Value'].sum() + test.current_cash))//new_port['close']).astype(int)
if len(port)!=0:
stocks_to_sell = list(port['Ticker'])
shares_to_sell = list(port['Quantity'])
so = dict(zip(stocks_to_sell, shares_to_sell))
stocks_to_buy = list(new_port['Instrument'])
shares_to_buy = list(new_port['shares'])
po = dict(zip(stocks_to_buy, shares_to_buy))
test.execute_trades(so, date =d, t_type='Sell' )
test.execute_trades(po, date =d, t_type='buy' )
else:
stocks_to_buy = list(new_port['Instrument'])
shares_to_buy = list(new_port['shares'])
po = dict(zip(stocks_to_buy, shares_to_buy))
test.execute_trades(po, date =d, t_type='buy' )
test.update_portfolio(date =dates[-1] , cash_add = None)
Order to BUY 1577 shares of WMT validated and approved Order to BUY 577 shares of MA validated and approved Order to BUY 1201 shares of CVX validated and approved Order to BUY 301 shares of BA validated and approved Order to BUY 253 shares of COST validated and approved Order to BUY 398 shares of DHR validated and approved Order to BUY 833 shares of ABBV validated and approved Order to BUY 122 shares of CHTR validated and approved Order to BUY 23 shares of BKNG validated and approved Order to BUY 992 shares of MS validated and approved Order to BUY 91 shares of BLK validated and approved Order to BUY 503 shares of CL validated and approved Order to BUY 112 shares of NOW validated and approved Trades Executed Order to SELL 1577 shares of WMT validated and approved Order to SELL 577 shares of MA validated and approved Order to SELL 1201 shares of CVX validated and approved Order to SELL 301 shares of BA validated and approved Order to SELL 253 shares of COST validated and approved Order to SELL 398 shares of DHR validated and approved Order to SELL 833 shares of ABBV validated and approved Order to SELL 122 shares of CHTR validated and approved Order to SELL 23 shares of BKNG validated and approved Order to SELL 992 shares of MS validated and approved Order to SELL 91 shares of BLK validated and approved Order to SELL 503 shares of CL validated and approved Order to SELL 112 shares of NOW validated and approved Trades Executed Order to BUY 1644 shares of MSFT validated and approved Order to BUY 106 shares of AMZN validated and approved Order to BUY 584 shares of JNJ validated and approved Order to BUY 665 shares of WMT validated and approved Order to BUY 582 shares of PG validated and approved Order to BUY 1055 shares of XOM validated and approved Order to BUY 1811 shares of T validated and approved Order to BUY 394 shares of DIS validated and approved Order to BUY 993 shares of VZ validated and approved Order to BUY 996 shares of WFC validated and approved Order to BUY 1051 shares of CMCSA validated and approved Order to BUY 218 shares of IBM validated and approved Trades Executed Order to SELL 1644 shares of MSFT validated and approved Order to SELL 106 shares of AMZN validated and approved Order to SELL 584 shares of JNJ validated and approved Order to SELL 665 shares of WMT validated and approved Order to SELL 582 shares of PG validated and approved Order to SELL 1055 shares of XOM validated and approved Order to SELL 1811 shares of T validated and approved Order to SELL 394 shares of DIS validated and approved Order to SELL 993 shares of VZ validated and approved Order to SELL 996 shares of WFC validated and approved Order to SELL 1051 shares of CMCSA validated and approved Order to SELL 218 shares of IBM validated and approved Trades Executed Order to BUY 3992 shares of AAPL validated and approved Order to BUY 94 shares of AMZN validated and approved Order to BUY 68 shares of GOOGL validated and approved Order to BUY 551 shares of WMT validated and approved Order to BUY 207 shares of UNH validated and approved Order to BUY 1890 shares of T validated and approved Order to BUY 258 shares of HD validated and approved Order to BUY 885 shares of VZ validated and approved Order to BUY 1276 shares of XOM validated and approved Order to BUY 1112 shares of CMCSA validated and approved Order to BUY 533 shares of CVX validated and approved Order to BUY 210 shares of BA validated and approved Order to BUY 91 shares of COST validated and approved Trades Executed Order to SELL 3992 shares of AAPL validated and approved Order to SELL 94 shares of AMZN validated and approved Order to SELL 68 shares of GOOGL validated and approved Order to SELL 551 shares of WMT validated and approved Order to SELL 207 shares of UNH validated and approved Order to SELL 1890 shares of T validated and approved Order to SELL 258 shares of HD validated and approved Order to SELL 885 shares of VZ validated and approved Order to SELL 1276 shares of XOM validated and approved Order to SELL 1112 shares of CMCSA validated and approved Order to SELL 533 shares of CVX validated and approved Order to SELL 210 shares of BA validated and approved Order to SELL 91 shares of COST validated and approved Trades Executed Order to BUY 1662 shares of MSFT validated and approved Order to BUY 527 shares of FB validated and approved Order to BUY 68 shares of GOOGL validated and approved Order to BUY 680 shares of JNJ validated and approved Order to BUY 720 shares of WMT validated and approved Order to BUY 391 shares of V validated and approved Order to BUY 229 shares of MA validated and approved Order to BUY 226 shares of UNH validated and approved Order to BUY 1064 shares of INTC validated and approved Order to BUY 247 shares of HD validated and approved Order to BUY 1121 shares of XOM validated and approved Order to BUY 484 shares of CVX validated and approved Order to BUY 183 shares of MCD validated and approved Order to BUY 111 shares of COST validated and approved Trades Executed Order to SELL 1662 shares of MSFT validated and approved Order to SELL 527 shares of FB validated and approved Order to SELL 68 shares of GOOGL validated and approved Order to SELL 680 shares of JNJ validated and approved Order to SELL 720 shares of WMT validated and approved Order to SELL 391 shares of V validated and approved Order to SELL 229 shares of MA validated and approved Order to SELL 226 shares of UNH validated and approved Order to SELL 1064 shares of INTC validated and approved Order to SELL 247 shares of HD validated and approved Order to SELL 1121 shares of XOM validated and approved Order to SELL 484 shares of CVX validated and approved Order to SELL 183 shares of MCD validated and approved Order to SELL 111 shares of COST validated and approved Trades Executed Order to BUY 3038 shares of AAPL validated and approved Order to BUY 1411 shares of MSFT validated and approved Order to BUY 92 shares of AMZN validated and approved Order to BUY 507 shares of WMT validated and approved Order to BUY 142 shares of TSLA validated and approved Order to BUY 177 shares of MA validated and approved Order to BUY 200 shares of HD validated and approved Order to BUY 99 shares of NVDA validated and approved Order to BUY 148 shares of CRM validated and approved Trades Executed Order to SELL 3038 shares of AAPL validated and approved Order to SELL 1411 shares of MSFT validated and approved Order to SELL 92 shares of AMZN validated and approved Order to SELL 507 shares of WMT validated and approved Order to SELL 142 shares of TSLA validated and approved Order to SELL 177 shares of MA validated and approved Order to SELL 200 shares of HD validated and approved Order to SELL 99 shares of NVDA validated and approved Order to SELL 148 shares of CRM validated and approved Trades Executed Order to BUY 2987 shares of AAPL validated and approved Order to BUY 1412 shares of MSFT validated and approved Order to BUY 461 shares of FB validated and approved Order to BUY 55 shares of GOOGL validated and approved Order to BUY 130 shares of TSLA validated and approved Order to BUY 302 shares of V validated and approved Order to BUY 118 shares of NVDA validated and approved Order to BUY 176 shares of MA validated and approved Order to BUY 198 shares of PYPL validated and approved Order to BUY 89 shares of ADBE validated and approved Order to BUY 80 shares of NFLX validated and approved Order to BUY 189 shares of CRM validated and approved Trades Executed Order to SELL 2987 shares of AAPL validated and approved Order to SELL 1412 shares of MSFT validated and approved Order to SELL 461 shares of FB validated and approved Order to SELL 55 shares of GOOGL validated and approved Order to SELL 130 shares of TSLA validated and approved Order to SELL 302 shares of V validated and approved Order to SELL 118 shares of NVDA validated and approved Order to SELL 176 shares of MA validated and approved Order to SELL 198 shares of PYPL validated and approved Order to SELL 89 shares of ADBE validated and approved Order to SELL 80 shares of NFLX validated and approved Order to SELL 189 shares of CRM validated and approved Trades Executed Order to BUY 2479 shares of JPM validated and approved Order to BUY 6536 shares of BAC validated and approved Order to BUY 1543 shares of CVX validated and approved Order to BUY 3186 shares of WFC validated and approved Order to BUY 1547 shares of MS validated and approved Order to BUY 1644 shares of C validated and approved Order to BUY 434 shares of BA validated and approved Order to BUY 281 shares of GS validated and approved Order to BUY 1164 shares of USB validated and approved Trades Executed
### updates the portfolio's performance between the last trading day and the end of the time horizon
test.update_portfolio(date =dates[-1] , cash_add = None)
### Snapshots offer a view of a portfolio in time as it stod/stands
snap_port = test.snapshots['Positions_2019-12-31']
snap_cash = test.snapshots['cash_2019-12-31']
### the track record atribute is a record of a portfolio's performance over time
tr = test.track_record
Portfolios are usually evaluate it in both risk adjusted and absolute terms. Evaluating a portfolio in absolute terms is often times as simple as looking at performance overtime and comparing it to a peer or benchmark. Whereas evaluating a portfolio in risk-adjusted terms usually requires that the analyst “Level the playing field” somehow (math is usually involved).
In absolute terms the portfolio performance is OK for the most part. We do see periods of underperformance initially but the model did a good job protecting the downside during the start of the COVID pandemic. In fact at the market's trough, on March 23 of 2020, the market was down -24% from the start of the evaluation, while the model down only -15%. Since that time the model has for the most part outperformed in absolute terms. The model did have some difficulty with regime change as the market went from growth to value in late Jan of 21 but recoverd nicely after the next trading date.
pch.performance_chart(tr, 'SPY')
We use 5 major risk adjusted metrics to evaluate our portfolio
The risk adjusted metrics paint a much brighter picture for the model as ALL show that the model outperformes the market nicely.
pch.risk_adjusted_metrics(tr,'SPY')
The capital asset pricing model or CAPM is one of the most widley used models in finance. It is used to evaluate risk, calculate expected returns, and measure portfolio outperformace. It is interpreted as follows:
pch.capm_res(tr, "SPY")